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Preface 


The  first  course  in  Basic  Principles  introduced  the  concept  of  an  imaging  chain  and 
considered  the  links  about  the  sources  of  radiation  and  the  materials  used  to  capture 
the  image.  This  course  considers  the  links  that  collect  the  radiation  (optics)  and 
process  the  data. 
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Chapter  1 

Review  for  Optics 


1.1  Optics:  Introduction  and  Review 

The  science  of  optics  is  often  divided  into  three  classifications  based  on  the  scale  of 
the  phenomena  considered. 

I.  Geometrical  Optics  (Ray  Optics):  Macroscopic-scale  Phenomena 

considers  light  to  be  a  RAY  that  travels  in  a  straight  line  until  it  encounters 
an  interface  between  media.  The  wavelength  A  and  frequency  v  of  the  light  are 
assumed  to  be  zero  and  infinity,  respectively:  A— >0,  z/— >00; 
explains  reflection  and  refraction 
useful  for  designing  imaging  systems, 
more  difficult  to  assess  the  quality  of  the  resulting  image 

II.  Physical  Optics  (Wave  Optics):  Microscopic-scale  Phenomena 

considers  light  (electromagnetic  radiation)  to  be  a  WAVE; 
the  action  of  light  is  described  by  Maxwell’s  equations; 

light  is  a  traveling  wave  with  wavelength  A,  temporal  frequency  u,  phase 
velocity  c; 

leads  to  explanations  of  reflection,  refraction,  diffraction,  interference,  polar¬ 
ization,  dispersion.  Useful  for  assessing  the  quality  of  the  images. 

III.  Quantum  Optics:  Atomic-scale  Phenomena 

light  is  a  photon,  has  both  wave-like  and  particle-like  characteristics; 
used  to  analyze  the  interaction  of  light  and  matter  on  a  sub-microscopic  level; 
explains  the  photoelectric  effect,  lasers. 

Phenomena  in  the  first  two  of  these  categories  are  most  relevant  to  imaging;  we 
will  ignore  the  third. 

Referenced  Sources:  Optics  Texts: 

[H]  E.  Hecht,  Optics,  2nd  Edition,  Addison- Wesley,  1987 

[KF]  M.V.  Klein  and  T.E.  Furtak,  Optics,  Second  Edition,  Wiley,  1986 

[JW]  F.  Jenkins  and  H.  White,  Fundamentals  of  Optics,  4th  Edition,  McGraw- 
Hill,  1976. 
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[NP]  A.  Nussbaum  and  R.  Phillips,  Contemporary  Optics  for  Scientists  and 
Engineers,  Prentice-Hall,  1976. 

[I]  K.  Iizuka,  Engineering  Optics,  Springer- Verlag,  1985. 

[FBS]  D.  Falk,  D.  Brill,  and  D.  Stork,  Seeing  the  Light,  Harper  and  Row,  1986. 
Physics  Texts: 

[HR]  D.  Halliday  and  R.  Resnick,  Physics,  3rd  Edition,  Wiley,  1978. 

[C]  F.  Crawford,  Waves,  Berkeley  Physics  Series  Vol.  Ill,  McGraw-Hill,  1968. 
John  D.  Jackson,  Classical  Electrodynamics,  Third  Edition,  Wiley,  1998  §6. 


Chapter  2 


Review:  Oscillations 


Sources:  HR  §15,  C§1 

Before  discussing  the  wave  nature  of  light,  it  is  important  to  review  the  salient 
characteristics  of  oscillations  and  waves  from  basic  physics. 

Oscillation  -  periodic  variation  of  any  characteristic  of  a  physical  system  about 
some  equilibrium  (mean)  value 

e.g.,  position  angle  of  a  pendulum  bob  in  a  gravitational  field 

position  of  a  mass  on  a  spring 

voltage  across  the  capacitor  in  an  LC  circuit 


1  I 

g 

Pendulum  Oscillator  Oscillator  Built  from  a  Capacitor 

and  an  Inductor 

The  position  angle  0  of  the  pendulum  bob  and  the  voltage  across  the  capacitor 
plates  (or  current  in  the  circuit  or  the  magnetic  field  generated  by  the  inductor,  ...) 
oscillate  as  functions  of  time.  The  position  angle  of  the  pendulum  varies  about  its 
mean  value  (the  vertical  defined  by  the  gravitational  field)  as  a  periodic  function  of 
time. 

Oscillations  result  from  the  joint  presence  of  two  forces: 

1.  Inertia :  displaces  the  physical  quantity  (e.g.,  the  position  angle  of  the  pen¬ 
dulum  or  the  voltage  in  the  LC  oscillator)  from  its  equilibrium  value. 

2.  Restoring  (or  return)  force:  opposes  changes  in  the  physical  quantity,  acts 
to  return  it  to  equilibrium.  The  greater  the  deviation  from  equilibrium,  the  larger 
the  restoring  force,  (acts  as  negative  feedback  whose  restoring  force  increases  with 
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deviation  from  equilibrium) 

Oscillations  of  matter  can  be  either  transverse  or  longitudinal  (or  some  combina¬ 
tion): 

1.  Longitudinal  oscillation :  the  vectors  describing  the  opposing  forces  are  par¬ 
allel,  e.g.,  a  mass  attached  to  a  spring,  restricted  to  motion  toward  or  away 
from  the  spring;^ 

2.  Transverse  oscillation :  the  vectors  describing  the  two  forces  are  not  parallel, 
e.g.,  the  pendulum,  where  inertial  force  is  horizontal  and  restoring  force  is 
vertical,  unrestricted  motion  of  mass  on  a  helical  sirring 


2.1  Harmonic  Oscillations 

The  simplest  oscillations  are  harmonic ,  which  means  that  the  function  describing  the 
oscillation  is  composed  of  a  single  sinusoidal  frequency  (usually  defined  as  a  cosine 
rather  than  a  sine  because  this  is  more  compatible  with  complex  notation).  For  ex¬ 
ample,  consider  the  position  angle  of  a  pendulum  in  a  gravitational  field  as  a  function 
of  time: 


V  M  -  Vo  =  A0  cos  {$  [t]}  =  A0  cos  [u0t  +  fa] 


y  is  the  “position”  of  the  characteristic  of  the  medium, e.g.,an  angle, voltage, 

etc. 

2/o  -  equilibrium  value  of  the  characteristic; 

Aq  -  amplitude  of  the  oscillation,  i.e.,  maximum  displacement  from  equilib¬ 
rium,  units  of  A0  are  the  same  as  those  of  y:  [A]  =  [y\ ; 

cuo  -  angular  temporal  frequency  of  the  oscillation,  units  are  [cu0]  =  ; 

v  -  temporal  frequency  of  the  oscillation,  units  are  \iA  =  cycles  =  Hertz 

[Hz] f  ; 

T  -  period  of  the  oscillation,  units  are  [T]  =  seconds,  T  =  ~  ^  ! 
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$  -  phase  angle  of  the  oscillation  (the  argument  of  the  sinusoid)  in  radians; 
4> o  -  initial  phase  of  the  oscillation,  i.e.,  phase  angle  measured  at  the  origin 
of  coordinates  t  =  0,  units  of  [0O]  =  [*£]  =  radians. 


2.2  Harmonic  Oscillations  —  Energy  Considerations 


Given  the  equation  of  motion  of  a  simple  system,  e.g.,  y  [£]  =  A0cos  [u>0t  +  0O]>  the 
velocity,  acceleration,  and  force  exerted  by  the  system  can  be  calculated  by  taking 
derivatives: 

Velocity: 


dy  .  d  .  r  , 

V  =  —  =  y  =  —  (A0cos  [u0t  +  0 oj) 

=  A0—  (cos  [c dot  +  0o])  =  bio  '  (— sin  +  0o] )  = 


-A0ojq  sin  [uj0t  + 


Acceleration: 


Inertial  Force 


d2y 


a='AA-  =  y  =  -A0Uq  cos  [u a0t  +  0O] 


ma  =  my  =  —  m  {oj^Aq  cos  [coot  +  0o]) 
=  -mul  (A0  cos  [cu0t  +  0o]) 

=  -mwl  -y[t]  =  -ky 


where: 

k  =  mojQ 

is  the  “Force  Constant”  of  Restoring  Force: 
The  force  equation  may  be  transposed  to: 


my  +  ky  =  0 

this  is  the  equation  of  motion  for  the  simple  harmonic  oscillator. 

From  these  equations,  it  is  easy  to  derive  the  potential  and  kinetic  energies  of  the 
harmonic  oscillator: 

Kinetic  Energy  £k  : 


£k  [t]  =  2my2  =  y  (— cu0A0sin  [ u0t  +  0O])2 


Sk  [t]  =  sin2  + 


Potential  Energy  £p  : 


£P  M  =  -  /  F  •  ds  =  —  f  (-ky)  dy 
Jo  Jo 


2„,2 


=  i  ky  =  |  mujo  y 
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£p  [t]  =  cos2  [u0t  +  <f>0] 


Total  Energy  £  [t]  is  the  sum  of  kinetic  and  potential  energies: 

£  [t]  =  £k  [t]  +  £p  [t]  =  mAouo  sin2  ^Qt  _j_  0Q]  +  cos2  [c u0t 

m  A2m2 

=  0  0  [sin2  [uj0t  +  </>o]  +  cos2  [u0t  - 

mAl  cUq 


which  is  a  constant  over  time! 


£  = 


mA%u 


Observations: 

1.  £  is  not  a  function  of  time,  i.e.,  the  total  energy  is  constant 

2.  £/,.  and  £p  are  both  always  greater  than  0. 

3.  £  oc  Aq,  the  energy  is  proportional  to  the  square  of  the  amplitude 

4.  £  oc  Co'q  ,  the  energy  is  proportional  to  the  square  of  the  frequency:  higher  fre¬ 
quency  =>  more  energy 

5.  cUq  is  the  return  force  per  unit  displacement  per  unit  mass. 

2.2.1  Anharmonic  Oscillations 

Oscillations  also  may  be  anharmonic ,  or  nonharmonic.  This  simply  means  that  the 
characteristic  of  the  physical  system  varies  in  a  nonsinusoidal  manner.  For  example: 


y(t) 


Anharmonic  Oscillation  Is  NOT  Sinusoidal 


2.3  REPRESENTATIONS  OF  HARMONIC  OSCILLATIONS 
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The  mathematical  formulas  for  the  motion  and  energy  of  the  anharmonic  oscillator 
are  identical  to  those  for  the  harmonic  oscillator,  but  the  derivatives  and  integrals 
are  much  more  complicated  to  calculate.  Fortunately,  as  we  shall  see,  virtually  any 
periodic  function  can  be  decomposed  into  a  sum  of  harmonic  functions.  Recall  that 
differentiation  is  linear ,  i.e., 

vm  =  hi*\+m  =  f +£ 

Therefore,  the  derivatives  of  each  component  may  be  taken  separately  and  summed 
to  find  the  derivatives  of  the  result. 

The  decomposition  of  a  function  into  its  component  frequencies  is  known  as  Fourier 
analysis,  and  will  be  discussed  in  more  detail  later. 


2.3  Representations  of  Harmonic  Oscillations 


Since  harmonic  oscillators  exhibit  sinusoidal  motion,  they  may  certainly  be  described 
by  trigonometic  functions  as  above. 


v 


m  = 


where  the  second  expression  arises  because  sin  9  =  cos  [|  —  9\  and  the  last  expression 
from  the  symmetry  of  the  cosine  (i.e.,  cos  [— 9})  =  cos  [+#]).  This  description  of 
oscillations  is  perfectly  ok  -  it  leads  to  all  the  correct  results  -  but  it  can  be  awkward 
to  keep  a  math  handbook  handy  to  recall  the  necessary  expressions  for  the  cosine 
and/or  sine  of  sums,  differences,  and/or  products  of  angles.  The  notation  becomes 
even  more  complicated  when  considering  the  superposition  (sum)  of  many  oscillations 
or  waves.  For  example,  how  easy  is  it  to  find  the  resultant  of  the  sum  of  two  oscillators, 
yi  [£]  +  1/2  [t],  where  iji  =  A*  sin  [u>it  +  </>*]?  You  can  look  this  up  to  find: 


y\  M  +  2/2  [t] 


sin  [oj\t  +  </>i]  +  sin  [u^t  +  4> 2 


2  sin 


CUi  +  CU2 


t  + 


but  this  result  is  easy  to  derive  by  using  complex  notation,  as  shown  in  the  next- 
section. 


Chapter  3 

Review:  Complex  Numbers 


H  §2 

The  complex  representation  offers  many  mathematical  advantages  over  trigono¬ 
metric  expression  for  oscillators. 

Complex  numbers  arise  from  imaginary  numbers.  Since  there  is  no  real  number 
solution  for  \/— 1.  the  imaginary  number  i  is  arbitrarily  assigned  as  the  solution,  i.e., 

i  =  \f— I  =>-  i 2  =  —1 

Complex  Number:  A  complex  number  z  is  an  ordered  pair  of  real  numbers 
[a,  b]  =  a  +  ib: 

a  is  the  real  part  of  z  (Re  {z})  and  b  is  the  imaginary  part  (Im  {z}). 

Complex  Conjugate:  The  complex  conjugate  of  a  complex  number  z  =  a  +  ib 
is  defined  as  z*  =  a  —  ib ,  i.e.,  simply  replace  i  with  —i. wherever  it  appears! 

Complex  Arithmetic:  Given  two  complex  numbers  Z\  =  a  \  +  ib\  and  z2  = 
a2  +  '11)2.  the  following  arithmetic  rules  apply: 

1.  Equality:  z i  =  z2  if  and  only  if  cq  =  a2  and  b\  =  b2; 

2.  Addition:  z\  +  z2  =  (cq  +  ib\)  +  (a2  +  ib2)  =  (cq  +  a2)  +  i(b\  +  b2),  (add  real 
and  imaginary  parts  separately,  Re  {z±  +  z2}  =  cq  +  a2,  Im  {zi  +  z2}  =  b\  +  b2); 

3.  Multiplication: 


Z\  ■  z2  =  (cq  +  ib\ )  •  ( ci2  +  ib2 ) 

=  cqcq  +  a\(ib2 )  +  a2(ib\)  +  (ibi)(ib2) 
=  (cqcq  —  bib2)  +  i{a\b2  +  a2b\) 
Re{^i^2}  =  a io2  -  hb2 
lm{ziz2}  =  cq&2  +  a2bp, 


4.  Reciprocal:  (use  t/izs  trick)  multiply  z2  by  1  in  this  form: 


2: 


* 


2 

* 

2 


a2  - ib2 
a2  - ib2 


1,  assuming  that  z\  ^  0,  and  thus  that  z2  ^  0 


9 


10 


CHAPTER  3  REVIEW:  COMPLEX  NUMBERS 


to  obtain  the  reciprocal  of  z2: 


1  4 


1  ci2  -  ib2 
z2  z\  a2  +  ib2  ci2  -  ib2 
a2  - ib2 


il  +  hi 


a2 


ll  +  bn 


+  i 


-b2 


In  +  bn 


Re 


Im 


Cl  2 


Z2. 

1 

z2 


a\  + 1)2 

b2 


a. 


2^bi 


The  magnitude  and  phase  of  the  reciprocal  are: 


a  2 


a*  +  bl 


il  +  bl 


(°i + bi  y 


il  +  bl 


il  +  bl 


if  ci 2  7^  0  or  b2  ^  0 


<3? 


-2-2 


=  tan 


b 2 


a2+b2 


a 2 


=  tan 


b 2 
Cl2 


=  —  tan 


h 

Cl2 


=  -$  {z2} 


5.  Division:  Apply  multiplication  and  the  reciprocal  to  obtain: 


Zl 

z2 


ci\  +  ib\ 
ci  2  +  ib2 


Zi  Zi  z%  oi  +  ibi  ci2  —  ib2  {ciia2  +  bib2 )  +  i(a2bi  —  0162) 

^2  ^2  ^2  °2  +  ib2  a2  -  ib2 

f  Zi  \  ciia2  +  bib2 

\  z2 


<4  + 


bl 


Im  (  * 

U2 


o2b  1  —  ciib2 


il  +  69 


6.  The  real  and  imaginary  parts  of  2  can  be  expressed  in  terms  of  z  and 


Re{^}  =  7}(z  +  z*) 

Im{4  =  \(z~z*) 
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The  magnitude  of  z  is  defined  as 


z\  =  \Jz  ■  z* 


3.1  Graphical  Representation  of  Complex  Num¬ 
bers 


As  you  learned  in  high-school  algebra,  any  ordered  pair  of  numbers  can  be  located 
on  a  two-dimensional  (2-D)  graph,  e.g.,  using  the  Cartesian  coordinates  [x,y\.  The 
y-axis  becomes  the  imaginary  axis,  i.e.,all  values  along  y  are  multiplied  by  i  =  y/—l. 
Such  a  plot  is  sometimes  called  an  Argand  diagram. 

For  example, 


zi  =  l  +  i 


Z2  —  2  +  i 


z3  —  Z1  +  z2 


3  4-2  i 


Z4  =  Zi  -  z2  =  -1 


SlDi 

Si  1 . 


T  SU 


Just  as  in  algebra,  we  can  also  represent  the  Cartesian  ordered  pair  [a,  b]  in  a  polar 
notation  z\  =  (Ai,  ^i),  where  A  is  the  magnitude  of  the  vector  [a,  b]  and  (j)  is  its  polar 
angle  (or  phase  angle): 


magnitude:  A\  =  \z\\  =  yjz\  ■  z\* 

=  a/(oi  +  i6i)(ai  -  ib  1) 


a\  +  b\ 
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phase:  (j)\  =  tan 
=  tan-1 

(n.  b.,  there  is  a  subtle  problem  with  this  definition  for  the  phase  -  the  inverse  tangent 
is  defined  on  the  interval  [— |,+|),  i.e.,  the  range  is  only  n  radians,  whereas  is 
defined  over  a  range  of  27t  radians.) 


1 IV 

CL\ 

Im  {zi} 
Re  {z±} 


Re{z}  =  Re{ct  +  ib}  =  a  =  A\  cos  [</>] 
Im{^}  =  Im{a  +  ib}  =  b  =  A1  sin  [</>] 


Magnitude  (a  real  number) 


V  a2  +  b2 


A± 


z  =  Re  {z}  +  i  Im  {z} 

=  Ai  cos  [4>]  +  A\{i  sin  [(j)]) 
=  Ai(cos  [4>\  +  i  sin  [(/)]) 


z  =  a  +  tb  =  [a,b]  =  (kjp) 


3.2  Euler  Relation  —  Complex  Exponentials 


H  §2  pp. 19-21,  Schaum’s  Outline  Complex  Variables  §1,  Schaum’s  Outline  Optics  §1 

Complex  numbers  are  very  conveniently  denoted  as  exponentials;  makes  multipli¬ 
cation  easy.  Represent  z  in  its  polar  form: 


3.2  EULER  RELATION  -  COMPLEX  EXPONENTIALS 
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2  =  (c  <t>) 

=  r  (cos  [(f)]  +  i  sin  [0]) 


This  expression  arises  from  the  Euler  relation : 

cos  [9]  +  i  sin  [9]  =  et6 


Proof:  Consider  z  =  [r  cos  9,  r  sin  9]  =  (r,  9) 


z  =  r  (cos  9  +  i  sin  9) 

dz  =  (cos  9  +  i  sin  9)  dr  +  r  (—  sin  9d9  +  i  cos  9cl9) 
=  (cos  9  +  i  sin  9)  dr  +  r  (—  sin  9  +  i  cos  9)  d9 
dv 

=  r  (cos  9  +  i  sin  9)  —  +  r  ( i 2  sin  9  +  i  cos  9)  d9 


dv  wi 

=  z- — b  i  r  (cos  9  +  i  sin  9)  d9  =  z  (  —  +  id6 


dr  .  dz  dz 

r  z  z 


dz 


=  loge  *  = 


dr 

r 

dr 


r 


'o 


loge  2  =  l°ge  r  +  i  9  = 

==>  z  =  [r  cos  9,  r  sin  9] 


eloge  2  =  e1' 


,  r+i  6  _  eloge  r  _ |_  e 


d9 

i  6\ 


z  =  re 


loge  r  +  X 


A  different  proof  of  Euler’s  relation,  for  those  who  know  power-series  expansions: 


oo  n  0  12  3  4 

^  w  X  X  X  X  X  X 

6  =X^T=oT  +  T!+¥  +  ¥+4r  +  '" 

n= 0 

2  3  4 

'Xj  u  XX*’ 

=  l+  a;  +  —  +  +  -  [where  0!  =  1] 

9 2  qA 

cos  9  =  1 - -  H — - -  d - =b  lim  {cos  9}  =  1 

2!  4!  6!  0^0 1  ; 

Q  03  05  07 

sin  9  =  — - -  -| - - - -  +  •  •  •  =>•  lim  {sin  0)  =  0 

1!  3!  5!  7!  0-0 1  J 
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=  1  +  id 
=  1  +  i6 
=  1  +  id 


{idf  {id)3  {id) 


3! 


2! 

i2d2  2id3 

el 

2l  3f  +  4l 
iJP_  d^_ 

~2T  +  4!" 

=  cos  d  +  i  sin  0. 


+ 


4! 


i2  ■  i2  ■  04 


4! 

ijF_ 

^T 


=  1 


+  i 


n  d3  d 5 
"'3!  +5! 


As  an  aside,  the  approximations  for  cosine,  sine,  and  tangent  of  small  angles  may  be 
evaluated  from  the  series: 


lim  {cos  [01 }  =  1 
0^0 

lim  {sin  [0]}  =0 
0^0 

lim  {tan  [0]}  =  lim  {  ^  \  =  d 

0^0  0^0  {  cos  [0]  J 

Graphs  fo  these  three  functions  are  compared  in  the  figure: 


e 

Plots  of  0,  sin  [0],  and  tan  [0]  for  |0|  <  1  radian,  showing  that  the  three  functions  are 
approximately  equal  for  |0|  <  {5  —  0.31  radians. 


3.3  ARITHMETIC  OF  COMPLEX  EXPONENTIALS 
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3.3  Arithmetic  of  Complex  Exponentials 

1.  equality: 

z\  =  A\e l<’>1  is  equal  to  Z2  =  A2el<^2  if  and  only  if  A\  =  A2and  <f>i  =  <f2 


2.  addition: 

Zl+z2  =  Aie ^  +  A2e^ 


3.  multiplication: 

ziz2  =  A1e^1A2e*^2  =  AiA^icpi  +  <j)2) 


4.  division: 

_  Alci<hc~i<t>2  _  ^1ct(</»l-</»2) 

Z2  A2  A2 


3.3.1  De  Moivre’s  Theorem 

Generalization  of  multiplication  of  complex  exponentials: 

zn  =  [r(cos  (j>  +  i  sin  f>)]n  =  rn  [cos  (p  +  i  sin  <p]n 
=  rn  (cos  \ncf)\  +  i  sin  [ruj)\ )  Proof  by  induction 

A  representation  of  e  can  be  derived  using  De  Moivre’s  theorem  or  the  series 
expansion  of  el<>  : 


-*  =  e«-o)  =  l  +  i(-9)  + 


=  1- 


9 2  9 4 

2!  +  4!  “ 
=  cos  [9]  —  i  sin  [9] 


-9f 


-9f 


2! 


3! 


+ 


„  92  9 3  9A 

2!  +  *3!  +  4! 
,  i  92  9A 
~  2!  +  4! 


„  93  95 

'0+3!  "5! 


-i  9- 


93  95 

3!  +  5! 


e  ld  =  cos  [6]  —  i  sin  [9]  =  (e+*e)  ’ 


Examples: 
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=  0  ==>  e°  =  1  because  cos  [0]  =  1  and  sin  [0]  =  0 


7T 

“  2 
=  7f  : 


e  2  =  cos 


ti- 

LT 


i  sm 


ti- 

LT 


=  i 


=  cos  [ 7r  1  +  i  sin  brl  =  —  1 


y[t]  =  A  cos  [ojt  +  0]  =  Re  {Ae^+A)  j  =  Re  {z  [*]} 

As  will  be  discussed,  products  and  sums  of  same-frequency  harmonic  oscillations 
are  easily  computed. 

3.4  Description  of  Harmonic  Oscillations  via  the 
Euler  relation 

To  illustrate  the  utility  of  complex  exponentials  for  describing  harmonic  oscillators, 
consider  the  action  of  z  [f]  =  AelU}t  as  a  function  of  time: 


z[t  =  0]  =  Ae*'°  =  A 


7T  T 

.  i  7T  .  / 

"71" 

"7 r 

\ 

4c o  8 

v r  T 

=  Ae  4  =  A  1  cos 

.  in  ( 

.4. 

"7T" 

+  i  sin 

.4. 

"7T" 

) 

~  2^  _  4". 

=  Ae  2  =  A  1  cos 

.2. 

+  i  sin 

.2. 

) 

_  7T  _  T 
uj  2 


“ A[ ,7i) (1  +  I) 


=  A  ■  i 

=  Aem  =  A  (cos  [tt ]  +  i  sin  [7r] )  =  — A 

'  =  A  -  -1  =  -A 


3tt  3T" 

.  *371- 

37T 

37T 

“  2^  “  T_ 

=  Ae  2  =  A  1  cos 

_  2  _ 

+  7  sin 

_  2  _ 

As  t  increases,  the  complex  function  describes  a  circle  of  radius  A  about  the  origin. 

Sm 
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If  the  vector  rotates  in  the  direction  of  with  increasing  time,  then  the  oscillation 
frequency  is  positive ;  if  the  vector  rotates  in  direction  of  —<i>  with  increasing  time,  the 
frequency  is  negative.  The  temporal  frequency  is  proportional  to  the  rate  of  change 
of  phase.  In  words,  the  faster  the  oscillation,  the  more  rapidly  the  phase  changes: 

dd>  [t] 
w  “  ~dT 

where  <3?  [t\  is  the  phase  of  the  complex  function.  Since  the  phase  has  “units”  of 
radians,  its  temporal  derivative  has  dimensions  of  radians  per  unit  time.  The  quantity 
cu  is  the  angular  temporal  frequency.  Since  there  are  2n  radians  per  cycle,  the  angular 
temporal  frequency  may  be  converted  to  temporal  frequency  v  via: 

cM >  [f]  T radians]  1 

dt  [  second  2n  radians  1 

cycle 

The  temporal  is  proportional  to  the  time  derivative  of  the  phase,  which  shows  directly 
that  the  temporal  frequency  u0  is  negative  if  the  phase  decreases  with  increasing  time. 


1  <9<f>  [t]  [  cycles  ]  _  .  . 

2n  dt  second 


3.5  Oscillations  as  Projections  of  Circular  Harmonic 
Motion 

The  sum  of  these  two  harmonic  oscillations  cos  [cut}  and  i  sin  [c at]  yields  uniform 
circular  motion.  Because  the  sine  term  is  imaginary,  it  is  oriented  at  right  angles  to 
the  (real)  cosine  term.  The  imaginary  part  of  the  motion  can  also  be  rewritten: 


.  r  1  l~7r  1  [  /7T  X 

Sill  [Cut]  =  COS  —  —  cut  =  COS  —  (  —  —  Cut  J 

=>■  y  [£]  =  cos  [cut]  +  i  cos  cut  —  — 

Z  - 


=  cos  +c ot  —  —  (because  cosine  is  even) 

Z  - 


Thus:  uniform  circular  motion  results  from  the  addition  of  two  harmonic  oscillations 
at  right  angles  and  with  a  phase  difference  of  |  radians  =  90°. 

Conversely:  the  projection  of  uniform  circular  motion  in  any  direction  yields  har¬ 
monic  motion.  The  initial  phase  of  the  harmonic  motion  is  determined  by  the  azimuth 
of  projection. 

For  example,  when  projecting  onto  the  real  axis,  the  information  about  variation 
along  the  imaginary  axis  is  ignored: 

Re  {y  [f ] }  =  cos  [cut] . 

Projection  onto  the  imaginary  axis  discards  information  about  variation  along  the 
real  axis,  and  the  result: 


r  7T " 

Im  {y  [t] }  =  sin  [cut]  =  cos  cut  —  — 

Z  - 
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Circular 


3.6  Phasor  Notation  for  Oscillations 


H  §7.3 

The  interpretation  of  harmonic  motion  as  a  projection  of  uniform  circular  motion 
leads  to  a  third  method  for  representing  oscillations  -  the  phasor.  Its  use  is  quite 
popular  in  electrical  engineering  applications. 

The  phasor  with  magnitude  A  and  phase  <3?  [t0\  is  denoted  by  the  polar  vector 
(A,  <3?  [t0])  that  describes  the  instantaneous  position  of  the  oscillator  on  the  2-D  plot 
(Argand  diagram).  As  time  progresses,  the  phasor  of  an  oscillator  rotates  with  period 
7’  =  ^  =  —.  Generally,  the  phasor  picture  portrays  the  amplitude  and  phase  of  the 
oscillator  at  a  particular  time  t0  (generally  to  =  0  seconds). 

Since  the  phasors  of  same-frequency  oscillators  rotate  at  the  same  rate,  their 
relative  phase  is  invariant.  Therefore,  the  phasor  picture  is  useful  for  describing  the 
relative  amplitudes  and  phases  of  two  or  more  oscillators  with  the  same  frequency. 
Also,  it  is  useful  for  finding  the  resultant  of  the  superposition  of  the  same-frequency 
oscillators,  as  will  be  shown. 


3.7  SUPERPOSITION  OF  OSCILLATIONS 
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5m- 


3.7  Superposition  of  Oscillations 


H§7,§14 

When  two  (or  more)  oscillations  (or  waves)  are  present  at  the  same  location  in  a 
medium  at  the  same  time,  the  resultant  motion  is  (obviously)  some  combination  of  the 
two  component  oscillations  (or  waves).  The  simplest  combination  of  the  components 
(and  the  most  common  for  electromagnetic  oscillations  or  waves)  is  the  superposition , 
or  sum.  When  the  principle  of  superposition  holds,  the  response  is  said  to  be  linear , 
i.e.,  the  resultant  y[t\  is  the  linear  combination  of  the  components  iji  [t]  +  y2  [t\. 
The  principle  of  superposition  holds  for  acoustic  and  electromagnetic  waves  in  most 
common  situations  (e.g.,  EM  waves  in  a  vacuum). 


3.7.1  Digression:  Nonlinear  Optics  and  Second-Harmonic  Gen¬ 
eration: 

To  help  illustrate  linear  media  and  the  principle  of  superposition,  we  will  first  consider 
an  example  where  superposition  is  not  valid.  There  are  situations  and  media  which 
can  generate  a  resultant  that  is  not  a  linear  combination  of  the  components.  This 
effect  has  developed  into  the  field  of  nonlinear  optics.  For  example,  a  high-energy 
laser  focused  on  one  of  a  class  of  crystals  (such  as  quartz  or  potassium  dihydrogen 
phosphate  s  KDP)  which  generate  some  emerging  energy  proportional  to  square  of 
the  sum  of  the  incident  electric  field  E: 

E  [t]  ~  (Ei  cos  [a; it]  +  E2  cos  [cu2t])2 

=  El  cos2  [cuif]  +  El  cos2  [oj2t]  +  2E±E2  cos  [uqf]  •  cos  [c o2t\ 
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As  we  will  shortly  demonstrate,  the  first  two  terms  on  the  right-hand  side  can  also 
be  written: 


i  (1  +  COS  [2co’it] ) 

Zi 

i  (1  +  cos  [2 tu2t]) 

These  individual  pieces  are  the  sums  sinusoids  that  oscillate  with  angular  temporal 
frequencies  2tui  and  2tu2,  respectively,  and  constant  terms  (cosines  that  “oscillate” 
with  frequency  equal  to  zero).  The  third  term  on  the  right  also  may  be  rewritten: 

2EiE2  cos  [uqt]  cos  [cu2t]  =  EXE2  {cos  [(uq  +uj2)t]  +  cos  [(uq  —  uj2)t]}  . 

In  words,  the  electromagnetic  interactions  in  this  nonlinear  medium  generates  sinu¬ 
soids  that  oscillate  with  frequences  that  differ  from  the  original  frequencies:  sinusoids 
with  zero  frequency,  twice  the  frequencies  of  the  component  functions,  and  sinusoids 
with  the  sum  and  difference  frequencies.  If  both  input  beams  have  the  same  frequency 
tu0  (and  thus  the  same  wavelength  A0),  there  will  be  an  output  component  with  fre¬ 
quency  u /  =  2tu,  X'  =  |.  For  example,  a  laser  rod  composed  of  Yttrium- Aluminum- 
Garnet  doped  with  Neodymium  (a  Nd:YAG  laser)  can  lase  to  make  a  beam  with 
A  =  1.06 pm  (in  the  “near- infrared”  region  of  the  spectrum).  If  the  laser  beam  has 
sufficient  energy  and  is  directed  onto  a  crystal  that  is  has  a  strong  “nonlinear”  re¬ 
sponse,  an  output  beam  may  be  produced  at  the  “doubled  wavelength”  A'  =  0.53  pm, 
i.e.,  visible  green  light.  Such  an  effect  is  called  second-harmonic  generation  and  is  a 
very  active  research  area  in  quantum  optics. 

Though  nonlinear  effects  are  of  great  interest  in  optics  today,  we  will  just  consider 
situations  where  the  principle  of  superposition  is  valid  -  the  output  is  the  sum  of  the 
component  terms. 


E\  cos2  [u>it]  =  E\ 
El  cos2  [cu2t]  =  El 


3.8  Superposition  of  Same- Frequency  Oscillations 

3.8.1  Trigonometric  Notation 

Consider  the  linear  superposition  of  two  oscillations  with  the  same  frequency  and 
different  amplitudes  and  phases: 

2/i  [t]  =  Ai  cos  [c at  +  (j)\\ 

y2  [t]  =  A2  cos  [c ut  +  <fo]  y  [t]  =  yi  [t]  +  2/2  [t] 

The  trigonometric  solution  of  the  resultant  y  [t]  can  be  found  as  follows: 

Vi  +  2/2  =  Ai  (cos  [cut]  cos  [(f)  1]  —  sin  [tut]  sin  [</>2])  +  A2  (cos  [cut]  cos  [</>2]  —  sin  [tut]  sin  [</>2]) 
=  cos  [tut]  (Ai  cos  [(j)  1]  +  A2  cos  [cj)2])  —  sin  [tut]  (A\  sin  [0i]  +  A2  sin  [02]) 
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Since  real  parts  add  to  real  parts,  etc.,  we  can  define  the  real  and  imaginary  parts  of 
the  resultant: 


Re  { ( A ,  (/))}  =  A  cos  4>  =  Ai  cos  [0i]  +  A2  cos  [02] 
Im{(A,  </>)}  =  A  sin  <f>  =  A±  sin  [0i]  +  A2  sin  [<f>2]  . 


The  squared  magnitude  of  the  result  is: 

(A  sin  0) 2  +  (A cos  0) 2  =  Ac  =  (Ai  sin  [0i]  +  A2sin  [ (j)2 ])2  +  ( A1  cos  [0i]  +  A2  cos  [02])2 

=t-  A  =  Af  +  A|  +  2AiA2  cos(0i  —  02), 


and  phase: 


Asin0 

Acos0 


=  tan  0  = 


Ai  sin 


A2  sin 


Ai  COS  [(f)  1]  +  A2  cos  [(f>2 


=  tan 


Ai  sin 


A2  sin  [ 


\_A\  cos  [0i]  +  A2  COS  [(f>2 


Consider  some  simple  cases: 


1.  Ai  =  A2,(j)i  =  (j)2  = =>•  same  amplitude,  same  phase: 

Ai  cos  [cut  +  <f)  1]  +  Ai  cos  [cut  +  (f)  1]  =  2 A1  cos  [cut  +  0i]  =>  A  =  2A1:  0  =  0  1 
A2  =  A2  +  A2  +  2A1A1  cos(0i  —  4>i)  =  2A^  +  2A^  cos(O)  =  4A2 

A  =  2Ai 


tan0  = 


2A1  sin  [(f>i 


2  A1  cos 

0l] 

0  =  01 

] 

Addition  of  two  identical  oscillations  gives  a  resultant  with  twice  the  amplitude 
and  the  same  phase,  as  expected. 


2.  Ai  =  A2,(j)2  =  {(f)  1  —  7r )  =>  same  amplitude,  phase  difference  of  t\  radians: 

A2  =  A2  +  A^  +  2A\A\  cos  [(0i  —  (0i  —  7r)] 

=  2 A2  +  2 A2  cos  [tt]  =  2A^  -  2A^  =  0 
=>-  A  =  0 

<f)  =  tan-1  [tt ]  =  ±00,  but  <f)  is  irrelevant  since  amplitude  A  =  0 

Addition  of  two  oscillations  with  same  amplitude  but  out  of  phase  by  ±7r  radians 
gives  zero  output,  also  as  expected. 
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3.  A1  =  A2,(j) 2  =  0i+f=^  same 
The  resultant  has  magnitude  : 


amplitude,  phase  difference  of 


-  radians. 


A2  =  A2  +  A2  +  2 A2  cos 

5 


7T 

2J 


=  2A?  ( 1  +  cos 


’i  -  (pi 

=  2A21^A  =  >/2Ai, 


and  phase: 


tan  (j)  = 


7T  N 
2- 


A1  sin  [(f) i]  +  A\  sin(0i 
Ai  cos  [(f> i]  +  Ai  cos(0i  + 


|_/ii  cos  -t-  yii  cos^i  -1-  j 
Since:  cos  [<^>i  +  | ]  =  —  sin  [0i] ,  sin  [fi  +  f  ]  =  cos  [0i] , 


tan  (j)  = 


cos  [^i]  +  sin  [f  i] 

cos  [(j) i]  (1  +  tan  [</>i]) 

cos  [(j) i]  —  sin  [(j) i] 

cos  [(f) i]  (1  —  tan  [0i]) 

tan  f  +  tan 

,  since  tan 

"  7 r 

+— 

1  —  tan  j  tan  <fi 

L  4J 

=  1. 

Now  we  cheat,  from  a  table  of  trigonometric  properties,  we  can  find  that: 

r  .  tan  a  +  tan  / 3 

tan  [a  +  p\  =  - - - - - - - 

1  —  tan  a  tan  p 

which  leads  to  the  observation: 

tan  +  tan  [< 


L_  ■-  ^  J 

and  the  phase  of  the  resultant  is: 


7T 

=  tan 

*  +  4. 

4>  =  0i  +  ^  (if  0i  =  0,  then  <f>  =  +^) 

If  you  add  two  oscillations  with  the  same  amplitude  and  a  phase  difference  of 
+,  the  resultant  has  the  Pythagorean  amplitude  A  =  \J A\  +  A\  and  a  phase  angle 
midway  between  those  of  the  components. 


3.8.2  Phasor  Representation 

Phasors  are  useful  for  computing  the  magnitude  and  phase  resulting  of  the  superpo¬ 
sition  (sum)  of  two  (or  more)  oscillators  with  the  same  frequency.  The  resultant  of 
the  superposition  of  two  oscillators  is  the  vector  sum  of  the  phasors  defining  the  two 
oscillators: 
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2/i  [t]  =  Ai  sin  [c ot  +  fa]  =  (Ai,  fa) 

y2  [t]  =  A2  sin  [ut  +  fa]  =  {A2,fa). 

The  resultant  phasor  is  (A,  fa  =  (Ai,  fa)  +  (A2,  fa).  The  magnitude  can  be  computed 
by  adding  the  real  and  imaginary  parts  separately: 

Re  {A}  =  Re  {Ai}  +  Re  {A2}  =  Ai  cos  [fa]  +  A2  cos  [fa] 

Im  {A}  =  Im  {Ai}  +  Im  {A2}  =  Ax  sin  [fa]  +  A2  sin  [fa] 

Since  the  two  oscillators  have  the  same  frequency  uj.  the  relative  phase  of  the  two 
oscillators  is  invariant,  and  thus  the  relative  initial  phase  is  sufficient  to  compute  the 
relative  phase  of  the  resultant. 

_1\A1  sin  [fa]  +  A2  sin  [fa] 

Q  =  tan  - - j— ; - - - T— r  . 

_ Ai  cos  [fa]  +  A2  cos  [fa\  _ 

n.b if  the  oscillators  have  different  frequencies,  the  relative  phase  dq  [t]  —  <h2  [£]  varies 
with  time  and  the  phasor  picture  is  not  useful. 

The  magnitude  also  may  be  computed  by  using  the  law  of  cosines: 

A^  =  A\  +  A2  —  2AiA2  cos  [(f)  1  —  {cf>2  —  7r)] 

=  A\  +  A22-  2AiA2  cos  [7 r  -  {fa  -  fa)] 

=  A\  +  A2  +  2AiA2  cos  [fa  —  fa] 

=  A\  +  A2  +  2AiA2  cos  [(f>\  —  fa] , 

where  the  last  step  follows  because  cos  [9]  =  cos  [—9]. 
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3.8.3  Complex  Notation 

Consider  the  complex  representation  of  two  oscillators  with  the  same  frequency  ur. 

Vl  [t]  =  Aiei[“t+(t,l] 

y2  [t]  =  A2e^t+^ 

y  [t]  =  Aei{ut+^  =  eiujtAe^ 

=  yi  M  +  1/2  [t]  =  +  A2ei[u,t+^] 

=  eiu}t  [Aie*^1  +  A2ei(p2}  =  eiujtAei(p 

=>-  Ae^  =  A^1  +  A2e^2 

n.b.,  The  resultant  oscillation  has  the  same  frequency  as  the  components  frequency 

cu 

The  last  line  represents  the  sum  of  two  phasors:  (Ai,  </> i),  (A2,  (j)2 ).  This  was  solved 
on  the  previous  page: 


A  —  \J  Af  +  A|  +  2AiA2  cos(0i  —  (j)2 ) 


tan  [( t> 


A\  sin[i/)i]+n2  sinful 
A\  cos[0i]+j42  cos[c/>2] 


THE  SUPERPOSITION  OF  TWO  SAME-FREQUENCY  OSCILLA¬ 
TIONS  IS  AN  OSCILLATION  OF  THAT  FREQUENCY 

(Fourier  analysis  makes  this  statement  obvious!) 


3.9  Superposition  of  Many  Same- Frequency  Oscil¬ 
lators 

Since  the  sum  of  two  same-frequency  oscillations  is  a  harmonic  oscillation  of  that 
frequency,  clearly  the  sum  of  N  same-frequency  oscillations  must  also  be  a  harmonic 
oscillation  of  that  frequency.  This  is  easy  to  prove  using  complex  notation: 


yn  [t]  =  Ane^t+^ 

N  N 

y  [t]  =  Anei{uot+<t>n)  =  ei0J0t  ^  Anei(j>n  =  eiuot  (. Ae i$)  . 

n= 1  n= 1 

The  resultant  oscillation  has  amplitude  A  and  phase  d>,  and  hence  may  be  specified 
by  the  phasor  (A,  <E>). 
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N 

Ae ^  Anel4>' 

n=  1 


Re{Ae**} 

Im{Ae*^ 


Re 


Im 


N 


Y  An  COS  [(j)ri 
n—1 

N 

Y  An  sin  [(j)n 


^  n= 1  )  n= 1 

By  the  Pythagorean  theorem: 


'  N 

2 

'  N 

A2  =  [Re{A}]2  +  [Im  {A}]2  = 

Y  An  COS  \(j)n} 
n=  1 

+ 

Y  An  sin  [4>n] 

n= 1 

Look  at  the  square  of  the  real  part: 


'  AT 

2 

'  V 

'  AT 

y  a»  cos  [0n] 

= 

^  Aj  COS  [0J-] 

5^  Afc  cos  [<f)k] 

_n=  1 

J=i 

k=  1 

N  N 

=  EE  AjAk  cos  [4>j\  cos  [4>k] 

j= i  k= i 

=  A2  cos2  [0i]  +  AXA2  cos  [(j) i]  cos  [02]  +  A2AX  cos  [02]  cos  [0i]  +  Ag  cos2  [02]  H - 

=  A2  cos^  [4>i\  +A2  cos2  [</>2] +•  •  -+2AiA2  cos  cos  [02] +2A2A3  cos  [</>2]  cos  [c/q] +•  •  • 

This  may  be  rewritten  as  the  sum  of  the  squared  terms  for  j  =  k  =  1, . . . ,  N  (which 
we  will  index  by  n  =  j  =  k )  and  the  sum  of  the  terms  for  which  j  =/=■  k.  The  second 
set  includes  two  identical  terms,  which  may  be  combined  by  considering  the  values  of 
j  >  k: 


'  N 

2 

'  AT 

'  V 

Y  An  COS  [0n] 

= 

J]  Aj  COS  [0j] 

y  Afc  cos  [0fc] 

_n= 1 

J= 1 

_fe=i 

N  N 

=  Y  Al  cos2  [(j)n]  +  Y  AjAk  cos  [(j)j\  cos  [4>k] 

n= 1  j^k 

N  N 

=  Y  A2n  cos2  [0n]  +  2  Y  AjAk  COS  [<f>j]  COS  [<f)k]  , 

n=  1  j>k 

i-e-j  j  =  [2,  N] ,  k  =  [l,N-l] 
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The  treatment  for  the  imaginary  part  is  identical: 


N 


2 


n=  1 


n  sm \<pn\ 


N 


N 


y,  Aj  sin2  [fa]  +  y  AjAk  sin  [fa]  sin  [fa] 

3= 1  jYfc 

JV  N 

y\  A2  sin2  [fa]  +  2  AjAk  sin  [fa]  sin  [< fa 

j= 1 


Therefore  the  square  of  the  resulting  magnitude  may  be  written  as  the  sum  of  these 
four  sums: 


'  N 

2 

'  N 

to 

y  An  cos  [fa] 

n=  1 

+ 

y  An  sin  [fa] 

n= 1 

N  N 

y  AjAk  cos  [fa]  cos  [fa\  +  y  AjAk  sin  [(f) j]  sin  [fa 

j>k  j>k 

N  N 

y  An  2  [cos2  [fa]  +  sin2  [fa]]  +2y  AjAk  (cos  [fa]  cos  [fa]  +  sin  [fa]  sin  [fa]) 

j>k 


n=  1 


Now  apply  the  trigonometric  identity  cos  [fa]  cos  [fa]  +sin  [fa]  sin  [fa]  =  cos  [fa  —  fa]: 


N  N 

A"  =  'y  [  An 2  +  2  ^  [  AjAk  cos  [fa  —  fa] 

n=  1  j>k 

Since  the  phase  angles  are  randomly  distributed,  the  phase  angle  of  the  resultant  is 
randomly  distributed  as  well  -  therefore,  no  prediction  of  the  phase  can  be  made. 


3.10  Superposition  of  Randomly  Phased  Oscilla¬ 
tors 

Special  Case  I:  The  oscillators  have  identical  amplitudes  ( Aj  =  Ak  =  A0)  and 
phases  that  are  randomly  distributed  over  the  full  domain  of  possible  phase  angles. 

Random  phases  =>  [fa]  is  randomly  distributed  in  the  interval  [0,  27t)  (i.e.,  0  < 
(j)  <  27 r)  or  equivalently  in  the  interval  [— 7r,  +7t)  (so  that  — 7r  <  (j)  <  +7r). 

[rAj] —  [fa]  is  randomly  distributed  in  [— 2tv,  2n),  and  so  is  randomly  distributed 
in  [0,  27t) 

==>  cos  [fa  —  fa]  is  randomly  distributed  over  the  interval  [—1,1] 

N  N 

=>  A2  =  y  Al  +  2  •  A2  y  cos  [fa  -  fa] 

n= 1  j>k 

Since  cos  [fa  —  fa]  is  randomly  distributed  over  the  interval  [—1,1],  we  expect  that 
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many  terms  likely  will  sum  to  zero: 

N 

y  cos  [4)j  ~  <j>k]  ~  o 

j>k 

so  that  the  amplitude  of  the  resultant  should  be. 


N 

A  2  =  J2Al  =  N-Al 

n= 1 


The  phase  of  the  sum  of  the  random-phase  oscillators  cannot  be  predicted,  but  can 
be  any  angle  in  the  interval  [— 7T,  +7t) 

Recall  that  the  energy  of  the  oscillator  is  proportional  to  A2,  so  if  the  phases 
are  random,  the  total  energy  is  the  sum  of  the  individual  energies,  as  expected.  Note 
that  the  total  amplitude  is  \/N  times  as  large  as  the  individual  amplitude.  Randomly 
phased  oscillators  are  said  to  be  incoherent. 


-20  -15  -10  -5  0  5  10  15  20 


Two  examples  of  superposition  of  randomly  phased  oscillators,  showing  resultant 

magnitudes. 


3.11  Superposition  of  Nonrandomly  Phased  Oscil¬ 
lators 

Special  Case  II:  Amplitudes  AND  phases  are  equal,  i.e.,  Aj  =  Ak  =  A0  and  [of  = 

[4>k\  =  4>o 
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I 


N 

A2  =  N  ■  Aq  +  2  •  A20  ^2  cos(0o  -  0o ) 

3= 2 


N-Al  +  2-A20J2l 

3=  2 

Ag(JV  +  2(JV-l)) 
(3A  -  2)Aq 


Examples: 


Ag  =  /0,  one  oscillator 
4A2  =  4/0  =>  4  x  energy  of  one  oscillator 
7Aq  =  7/0 

IOAq  =  10/0 

n.b.,  I  >  iV/0.  the  intensity  of  the  sum  of  N  in-phase  oscillators  is  larger  than 
expected,  i.e.,  the  noise  is  louder,  or  the  light  is  brighter.  Of  course,  energy  must  be 
conserved,  so  if  the  signal  is  “louder”  or  “brighter”  at  some  locations,  it  must  be  “less 
loud”  or  “dimmer”  at  other  locations. 

If  the  phase  relationship  between  the  component  oscillators  is  well-defined,  the 
oscillators  are  coherent. 


N  =  1  =>- 1  = 
N  =  2 => I  = 
N  =  3  =*►  I  = 
N  =  A => /  = 


-200  -150  -100  -50  0  50  100  150  200 


Real  Axis 


Two  examples  of  the  sum  of  nonrandomly  phased  oscillators.  In  one  case  (shown  in 
black),  the  sum  yields  a  “null  output ”  (resulting  magnitude  is  0).  In  the  other  case, 

the  sum  is  nonzero. 
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3.12  Superposition  of  Oscillations  with  Different 
Frequencies 

3.12.1  Complex  Notation 

H§7.5,  HR  §20 

We  have  just  seen  that  the  superposition  of  any  number  of  same-frequency  oscilla¬ 
tors  is  an  oscillation  with  that  frequency.  When  superposing  a  number  of  oscillators 
with  different  frequencies,  the  situation  is  quite  different  -  (almost)  any  periodic  func¬ 
tion  can  be  synthesized  from  the  summation  of  harmonic  terms.  This  is  the  principle 
of  Fourier  analysis. 

Simple  Example  -  Addition  of  two  oscillators  of  same  amplitude  A0,  same  phase 
4> o,  different  frequencies  uq  and  cu 2  : 

y  [t]  =  yi  [f]  +  y2  [f]  =  A0  cos  [cuR]  +  A0  cos  [c o2t] 

=  A0  (cos  [cuR]  +  cos  [co2t]) 

=  Re  {A0  [eiuJlt  +  eiuJ2t]  } 

Note  that  we  can  do  both  the  sum  of  the  cosines  and  of  the  sines  at  the  same  time: 


=>y[t]=  Re  {A0  [eiuJlt  +  eiuJ2t]  }  +  i  Im  {A0  [eiuJlt  +  eiuJ2t ]  } 
A0  (cos  [ouit]  +  cos  [c j2t])  =  A0  Re  {e8u;if  +  elUJ2t } 


=  An  Re  <  2  cos 


CO  1  -  cu2 
2 


t  ('e+i(^)‘ 


=  2An  cos 


COi  -  L 02 
2 


t\  Re  <  e 


A0  (cos  [00 it]  +  cos  [c o2t])  =  2 A0  cos  (ull2ul2)  t  ■  cos  t 
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A0  (sin  [ojit]  +  sin  [cu2f])  =  A0  Im  {elu;i*  +  etUJ2tJ 


=  An  Im  <  2  cos 


OJl  -  cu2 


=  2  An  cos 


CUi  -  cu2 


Im  <  e 


^1+^2  ' 


A0  (sin  [uqt]  +  sin  [c u2f])  =  2A0  cos 

•  sin 

[(sqq  n 

By  defining  an  average  and  a  modulation  (angular)  frequency: 


avg  — 


f?mod  — 


UJ\  +  CU2 
2 

CUi  -  U>2 


we  obtain: 


y  [t]  =  2 A0  cos  cos  [fimodf] 


In  words,  the  sum  of  two  harmonic  oscillations  with  different  frequencies  uq  and 
u>2  yields  the  product  of  two  harmonic  oscillations,  one  with  the  average  frequency 
flavg  =  UJl^UJ2 ,  and  one  with  the  so-called  modulation  frequency  12mod  =  UJl^UJ2 . 

Both  the  product  and  sum  of  different-frequency  sinusoids  yield  results  that  are  not 
harmonic.  The  former  is  equivalent  to  the  sum  of  sinusoids  at  the  sum  and  difference 
frequencies,  while  the  sum  is  equivalent  to  the  product  of  sinusoids  at  Qavg  and  Arnod. 
The  periods  of  the  superposition  are  Tavg  and  Tmod,  where  Tmod  >  Tavg  .  The  slower 
period  Tmod  is  the  source  of  the  phenomenon  known  commonly  as  beats,  from  its 
musical  context,  though  this  kind  of  pattern  is  seen  (heard?)  in  many  other  situations 
as  well.  Low-frequency  Moire  fringes  are  seen  when  two  periodic  patterns  are  overlaid 
are  examples.  The  phenomenon  of  aliasing  in  digital  signal/image  processing  is  closely 
related. 


The  converse  is  also  true:  the  product  of  two  periodic  signals  can  be  expressed  as 
the  sum  of  two  other  oscillations:  the  heterodyning  operation  in  radio  is  an  example. 
AM  radio  signals  are  broadcast  at  frequencies  560  kHz  <  u\  <  1600  kHz .  To  render 
the  signals  audible,  they  are  beat  down  by  multiplying  by  an  intermediate  frequency 
(IF)  z/2.  Two  signals  result:  one  with  frequency  +  and  one  with  zq  —  u2.  Judicious 
choice  of  zq  puts  the  lower- frequency  sideband  in  the  audible  range.  The  upper 
sideband  is  removed  by  a  filter  which  passes  only  low  frequencies  ( low-pass  filter). 


Example: 

Consider  the  product  and  sum  of  two  harmonic  oscillations  with  angular  frequencies 
/q  =  A  and  zq  =  A  cycles  per  unit  length,  so  the  corresponding  temporal  periods 
are  T\  =  50  and  T2  =  60.  These  are  illustrated  below: 
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a. 


■200  0  200  400 


b. 

8 


d. 


-400  -200 


200  400 


Sum  and  product  of  oscillations:  (a)  fi  [£]  =  cos  [27t  A] .  (b)  f2  [£]  =  cos  [27t  A] .  (c) 
/i  [t]  +  /2  [t],  also  showing  modulation  wave,  (d)  fi  [f]  x  /2  [t],  showing 

different-frequency  wave. 


The  sum  of  these  two  oscillations  is: 
cos  [2m ^t]  +  cos  [27 Tis2t]  =  2  cos 

=  2  cos 


2tt  I  +  1/2  I  t 


2  7T I  iih  1 1 


~  2  cos 

1 


2vr- 


t 


54.545 


cos 


cos 


27T  I  I  t 


2vr  I  ii - is  |  t 


•  cos 

2vr— 

600 

7 J  ~ 

uavg  — 


54.545’  Umod  600 


The  “slowly”  varying  term  with  period  600  is  generally  more  visible. 


The  product  of  ther  two  sinusoids  may  be  written  as  the  scaled  sum  of  sinusoids 
at  the  sum  and  difference  frequencies,  where  the  former  oscillates  at  a  rapid  rate  and 
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the  latter  oscillates  more  slowly: 

cos  [2nuit\  ■  cos  [27rz/2f]  = 


V sum 


1  r  /  x  n  1 

-  cos  [2vr  (iq  +  1/2)  +'1  +  2 

1  r  /  ,  ,  1 

-  cos  [2vr  (zq  +  u2)  t\  +  - 


cos 


2tt- 


27.27 


cos 


- Hz  ,  i/di  f  = - Hz 

27.27  ’  ;  300 


cos  [27t  (zq  —  zq)  £] 
COS  [27T  (zq  —  v 2)  t] 


3.13  Introduction  to  Fourier  Analysis 


The  motion  resulting  from  the  sum  of  two  oscillations  of  different  frequency  is  complex 
(i.e.,  anharmonic)  though  still  periodic  since  it  repeats  after  a  time  defined  by: 


rp  - 
mod 


2vr 


^mod  'Triorl 


47T 


-  OJ 2 


As  uj\  — >  cu2,  Tmod  lengthens.  In  the  limit,  Tmod  — >  00  and  Tavg  — >  Ti  =  T2. 

The  addition  of  more  oscillations  of  different  frequencies  produces  more  and  more 
complex  motion  (less  like  harmonic  motion).  For  example,  consider  this  sum  of  har¬ 
monic  oscillators: 

00  f  1 

y[t]=  ^2  (  ±-cos  [riuj0t] 

n=l,3,5,---  ^  n 


For  each  succeeding  term,  the  amplitude  decreases  and  the  frequency  increases.  The 
first  term  ( fundamental )  is: 


fi[t] 


cos 


h  W  =  0 


h  M  =  ~^cos 

h  M  =  0 


h[t] 


1 

+  -  cos 
5 


Obviously,  y  [t]  is  becoming  less  and  less  harmonic  as  more  terms  are  added,  and  in  fact 
is  starting  to  look  like  a  completely  different  function  -  a  square  wave.  Especially  note 
that  as  higher  frequency  components  are  added  (i.e.,  larger  values  of  11),  the  verticals 
become  “steeper”  and  the  edges  become  “sharper.  ”  Note  also  that  the  summation 
overshoots  when  transitioning  from  horizontal  to  vertical  and  vice  versa.  This  is 
known  as  the  Gibbs  phenomenon ,  and  the  visibility  of  this  effect  diminishes  as  more 
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terms  are  added.  As  N  — ►  oo,  the  function  y  [t]  becomes  a  periodic  square  wave, 
which  is  quite  dissimilar  from  the  component  functions. 

This  result  illustrates  the  principle  of  Fourier  Analysis ,  where  we  determine  the  set 
of  sinusoidal  constitutents  that  sum  to  create  the  function  f  [£].  The  complementary 
operation  of  Fourier  synthesis  sums  up  a  set  of  sinusoids  to  find  the  resultant. 


Sum  of  sinusoids  with  specific  different  magnitudes  and  frequencies  to  produce  a 

square  wave. 

(Virtually)  every  periodic  function  may  be  decomposed  into  a  sum  of  sines  and 
cosines  with  definite  amplitudes,  frequencies,  and  phases.  The  decomposition  is 
unique,  and  is  called  the  Fourier  series  representation,  or  the  spectrum  of  the 

periodic  function. 

The  spectrum  is  a  representation  of  the  amplitudes,  frequencies,  and  phases  of 
the  sinusoidal  components  that  superpose  to  create  the  function.  Often,  the  term 
spectrum  is  used  when  power  spectrum  would  be  more  accurate  -  it  is  the  power  (or 
energy,  the  squared  magnitude  of  the  component)  that  is  plotted  rather  than  the 
amplitude. 

This  concept  should  be  quite  familiar  to  you  -  the  spectrum  of  white  light  is 
analogous.  White  light  is  a  periodic  function  -  it  looks  the  same  at  all  times.  Spherical 
rain  droplets  act  as  prisms  to  disperse  white  light  into  its  constituent  components  - 
the  colors  of  the  spectrum.  The  brightnesses  of  each  color  correspond  to  the  energy 
of  the  component  -  brighter  =>-  more  energy.  The  droplet  prisms  act  as  Fourier 
transformers  since  they  derive  the  spectrum  of  the  function  As  Newton  showed,  the 
spectrum  can  be  transformed  back  to  white  light  with  another  prism. 


Chapter  4 

Review:  Traveling  Waves 


4.1  Introduction 

To  date,  we  have  considered  oscillations,  i.e.,  periodic,  often  harmonic,  variations  of  a 
physical  characteristic  of  a  system.  The  system  at  one  time  is  indistinguishable  from 
the  system  observed  at  a  later  time  if  the  time  difference  is  an  integral  number  of 
temporal  periods.  To  maintain  oscillatory  behavior,  the  energy  of  the  oscillator  must 
remain  within  the  system,  i.e.,  there  can  be  no  losses  of  energy.  We  will  now  extend 
this  picture  to  oscillations  that  travel  from  the  source  and  thus  transport  energy  away. 
Energy  must  be  continually  added  to  the  system  to  maintain  the  oscillation  and  the 
transported  energy  can  do  work  on  other  systems  at  a  distance. 

We  can  define  a  traveling  wave  as  “a  self  sustaining  disturbance  of  the  medium 
through  which  it  propagates,”  though  (as  we  shall  see)  sometimes  the  entity  that  can 
be  called  the  “medium”  is  not  so  obvious.  At  this  point,  we  will  ignore  this  problem. 

Our  first  task  is  to  mathematically  describe  a  traveling  harmonic  wave,  i.e.,  denote 
a  y  [t]  that  travels  through  space.  A  harmonic  oscillation  y(t )  =  A0cos  [cu0t],  can  be 
converted  into  a  traveling  wave  by  making  the  phase  a  function  of  both  x  and  t  in  a 
very  particular  way.  Consider  the  general  case  of  an  oscillatory  function  of  space  and 
time: 


y[z,t\  =  A0  cos  [<f>  [z,t]] . 

We  want  this  oscillation  to  move  through  space,  e.g.,  toward  positive  z.  In  other 
words,  if  a  point  of  constant  phase  on  the  wave  (e.g.,  a  peak  of  the  cosine  created  at 
a  particular  time  r)  is  at  a  point  ay  in  space  at  a  time  to,  the  same  point  of  constant 
phase  must  move  to  z\  >  z0  at  time  t\  >  t0- 


35 


36 


CHAPTER  4  REVIEW:  TRAVELING  WAVES 


“Snapshots”  of  sinusoidal  wave  at  two  different  times  t0  and  ti  >  t0,  showing 
motion  of  the  peak  originally  at  the  origin  at  t0.  The  wave  is  traveling  towards 
z  =  +oo  at  velocity  v The  phase  of  the  first  wave  at  the  origin  is  0  radians,  but 

that  of  the  second  is  negative. 


Since  the  wave  at  location  z\  and  time  t\  has  the  same  phase  as  the  wave  at  location 
z0  and  time  t0,  we  can  say  that: 

4>  [z0,  t0]  =  $  [zi,  tf\  =t>  cos  [z0,  t0]  =  cos  [zlt  tx]  =^>  y  [z0,  t0]  =  y  [zi,  ti] . 

In  addition,  for  the  wave  to  maintain  its  shape,  the  phase  <3?  [x,  t]  must  be  a  linear 
function  of  x  and  t;  otherwise  the  wave  would  compress  or  stretch  out  at  different 
locations  in  space  or  time.  Therefore: 

$  [z,  t\  =  az  +  j3t 

v  azo  T  f3t0  =  ctz\  T  (3t±. 

As  discussed,  if  ti  >  to  =>•  z\  >  zq  (i.e.,  wave  moves  toward  z  =  +oo),  then  a  and  (3 
must  have  opposite  algebraic  signs: 

<3?  [z,  t]  =  \a\z  —  \/3\t 

By  dimensional  analysis,  we  know  that  |o:|^—  \/3\t  has  the  “dimensionless  dimensions” 
of  angle  [i.e.,  measured  in  the  unitless  quantity  of  radians].  We  have  already  identified 
(3  =  cu0,  the  angular  frequency  of  the  oscillation.  Similarly,  if  [z]  =  mm  must  have 
dimensions  of  radians/mm,  i.e.,  a  tells  how  many  radians  of  oscillation  exist  per  unit 
length  -  the  angular  spatial  frequency  of  the  wave,  commonly  denoted  by  k: 

y. |_  [z,  t]  =  A0  cos  [ k0z  —  o;0t]  —  traveling  harmonic  wave  toward  z  =  +oo 

By  identical  analysis,  we  can  derive  the  equation  for  a  harmonic  wave  moving  toward 


x  =  — oo 
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y-  [ z,t }  =  A0  cos  [ k0z  +  ouot]  —  traveling  harmonic  wave  toward  z  =  —  oo 

The  waves  are  functions  of  both  space  and  time,  i.e.;  three  dimensions  [z,y,t]  are 
needed  to  portray  them.  Generally  we  display  y  either  as  a  function  of  z  or  fixed  t, 
or  as  a  function  of  t  for  fixed  z: 


4.1.1  2-D  Plot  of  1-D  Traveling  Wave 

The  1-D  traveling  wave  is  a  function  of  two  variables:  the  position  z  and  the  time  t. 
and  so  may  be  graphed  on  axes  with  these  labels.  An  example  is  shown  in  the  figure, 
where  z  is  plotted  on  the  horizontal  axis  and  t  on  the  vertical  axis.  In  this  case,  the 
point  at  the  origin  at  t  =  0  has  a  phase  of  0  radians.  That  point  moves  in  the  positive 
2:  direction  with  increasing  time,  and  so  is  a  wave  of  the  form 

y  [ z ,  t\  =  cos  [ k0z  —  cu0t] 


The  points  with  the  same  phase  of  0  radians  at  later  times  are  positioned  along 
the  line  shown.  The  velocity  of  this  point  of  constant  phase  is  A|,  and  thus  is  the 
reciprocal  of  the  slope  of  this  line. 


4.2  Notation  and  Dimensions  for  Waves  in  a  Medium 

Trigonometric  Notation: 

y  [z,  t]-y0  =  A0  cos  [z,  t\}  =  A0  cos (k0z  ±  uj0t  +  0O) 

Complex  Notation: 

y  [ z,t }  =  A0e^lz’t]  =  Re  ^A0e^koz±UJ°t+M} 

y  =  position  of  the  characteristic  of  the  medium,  e.g.,  [y]  =  angle,  voltage,  ...  ; 
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Do  =  equilibrium  value  of  the  characteristic; 

/1q  =  amplitude  of  the  wave,  i.e.,  maximum  displacement  from  equilibrium,  [/10]  = 

M; 

z,t  =  spatial  and  temporal  coordinates,  [z]  =  length  (e.g.,  mm),  [t]  =  s; 

T  =  period  of  the  wave,  [T]  =  sec  [s] ;  T  =  A  =  ^  ; 

A0  =  wavelength,  [A0]  =  mm 

cu0  =  angular  temporal  frequency  of  the  wave,  lu0  =  [ce0]  =  radlsans  ; 

k0  =  angular  spatial  frequency  of  the  wave,  k0  =  [k0]  =  ; 

u0  =  temporal  frequency  of  the  wave,  [v0\  =cycles  per  second  [c^es]  =  Hertz  [Hz], 

7/  —  • 

2t r  ’ 

<3?  =  phase  angle  of  the  wave,  [$]  =  radians,  (in  this  case,  <3?  is  linear  in  time  and 
space); 

<p o  =  initial  phase  of  the  wave,  i.e.  ,  phase  angle  @  t  =  0  and  z  =  0,  [</>o]  =  radians. 
<r0  =  wavenumber,  cr0  =  -A-  ,  number  of  wavelengths  per  unit  length,  [cr0]  =  mm-1. 
Relations  between  the  phase  and  the  temporal  frequencies 

oj0  1  (9*3? 

U°  2n  2n  dt 


4.3  Velocity  of  Traveling  Waves 


The  phase  velocity  v<^  of  a  wave  is  the  speed  of  travel  of  a  point  of  constant  phase.  A 
definition  for  phase  velocity  can  be  derived  by  dimensional  analysis:  [v0]  =  mm  per 
s;  [u0]  =  radians  per  s;  [k]  =  radians  per  mm: 

radians  per  second  radian-  mm  mm 
radians  per  mm  radian-  s  s 

Slightly  more  rigorously,  we  can  find  the  phase  velocity  of  a  wave  by  taking  derivatives 
of  the  equation  for  the  wave: 


y  [z,  t]  =  A0  cos  [koz  —  uiot  + 
dy_ 

dt 
dy 


ojq)Aq  ■  sin  [koz  —  cuq t  +  (j) o]  =  +A0cuo  •  sin  [koz  —  u>ot 


—  =  -( k0)A0  ■  sin  [ k0z  -  uj0t  +  0O]  =  -A0k0  ■  sin  [k0z  -  u0t 


v<p  = 


dz 

(19 

cu0 

dt 

(1) 

ko 

ujo 

ko  ’ 


or  by  considering  the  point  of  constant  phase  b  radians: 


k0z  —  coot  =  b 


2  =  (  T“  I  +  A  = 

k0t 


kn 


t  b'  =  —  is  a  new  constant 


h 
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Consider  the  positions  Z\  and  z2  of  the  same  point  of  constant  phase  at  different  times 
1 1  and  t2: 


zi  =  b'  +  (  ^  )  ti 


z2  =  b’ 


ko 


t-2 


=>•  Z\  —  Z2  =  A^:  = 

Ax  coo 

y<p=a t=Y0=v<p- 


At 


4.4  Superposition  of  Traveling  Waves 


Consider  the  superposition  of  two  traveling  waves  with  the  same  amplitude,  different 
phase  velocities,  and  different  frequencies: 


iji  [z,  f]  =  A0  cos  [k\Z  —  ojit] 
y2  [z,  t]  =  A0  cos  [k2z  -  u2t]  . 


We  can  use  the  same  derivation  developed  for  oscillations  by  defining  a  new  frequency 
for  both: 


A 


k\z 


A  2  = 


k2z 

-  —  CU 2 

t 


y  [z,  t)  =  yi  [z,  t]  +  y2  [z,  t )  =  A0  {cos  [kiz  -  uqt]  +  cos  [k2z  -  cu2f]} 
=  A0  { cos  |  |  — - cui  |  1 1  +  cos 


kiz 


=  A0  {cos  [Ait]  +  cos  [A2t]} 
bli  +  A2 N 


k2z  ,  , 
— - W2|f 


=  2  An  COS 


cos 


'A1  -  A-2 


just  as  before. By  evaluating  the  sum  and  difference  frequencies,  we  obtain: 


^1  +  ^2^  + 

(  hz 

2  )*- 

V  t 

where  kava  = 

k\  +  k2 

2  : 

wi 


k2z 


U2,2~ 


k\  +  k2 


CUi  +  0J2 


t  —  kaVgZ  l^avgt 


j  ^ avg  — 


OJi  +  U>2 


2 
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IT—  ( ki z  k2z  \  t  hi  —  kn  (j\—u2 

- 2 - Jt=  (  —j. - - —  +  ^2  J  2  =  — 2 — 2 - 2 - t  ~  kmodZ  ~  Wmodt 

,  ,  k\  —  k2  uj1-uj2 

where  fcmod  =  — - — ,wmod  = - - - 


4.5  Standing  Waves 


Consider  the  superposition  of  two  waves  with  the  same  amplitude  A0,  temporal  fre¬ 
quency  u0,  and  wavelength  Aq,  but  that  are  traveling  in  opposite  directions: 


fi  [z,  t]  +  f2  [z,  t]  =  A0  cos  [k0z  -  uQt }  +  A0  cos  [k0z  +  u0t] 


=  2  An  cos 


=  2  An  cos 


k0z  —  ujot  koz  +  LUot 
2  1  2 
k()Z  -I-  knZ  — (Jot  4”  Mot 


■  cos 


koz  —  ujot  koz  +  (Jot 


■  cos 

k0z  —  k0z  —  Jot  —  m0 1 

2  1  2 

2  2 
=  2 A0  cos  [koz]  ■  cos  [—o;ot] 

=  2A0  cos  [k0z]  ■  cos  [w0t] ,  because  cos  [—9]  =  +  cos  [+0] 

0 


=  2An  cos 


2tt 


An 


■  COS  [27 TU0t\ 


This  is  the  product  of  a  spatial  wave  with  wavelength  A0  and  a  temporal  oscillation 
with  frequency  u0. 


Standing  waves  produced  by  the  sum  of  waves  traveling  in  opposite  directions,  shown 
as  functions  of  the  spatial  coordinate  at  five  different  times.  The  sum  is  a  spatial 

wave  whose  amplitude  oscillates. 


4.6  ANHARMONIC  TRAVELING  WAVES,  DISPERSION 
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4.6  Anharmonic  Traveling  Waves,  Dispersion 


Thus  far  the  only  traveling  waves  we  have  considered  have  been  harmonic,  i.e.,  con¬ 
sisting  of  a  single  sinusoidal  frequency.  From  the  principle  of  Fourier  analysis,  an 
anharmonic  traveling  wave  can  be  decomposed  into  a  sum  of  traveling  harmonic  wave 
components,  i.e.,  waves  of  generally  differing  amplitudes  over  a  discrete  set  of  fre¬ 
quencies: 

OO  OO 

y  [z,t]  =  yn  =  Ai  COS  [knz  -  cunt  +  (j)n] , 

n= 1  n= 1 


where  An.  kn ,  and  u>n  are  the  amplitude,  angular  spatial  frequency,  and  angular  spatial 
frequency  of  the  nth  wave.  Therefore,  we  can  define  the  phase  velocity  of  the  nth  wave 
as: 

Mn  =  TT- 

rvn 

Now  suppose  that  a  particular  anharmonic  oscillation  is  composed  of  two  harmonic 
components  y  [x,  t]  =  yi(x,  y )  +  £/2  [x,  t}.  If  the  two  components  have  the  same  phase 
velocity,  (v^)1  =  (v<^)2,  then  points  of  constant  phase  on  the  two  waves  move  with  the 
same  speed  and  maintain  the  same  relative  phase.  The  shape  of  the  resultant  wave  is 
invariant  over  time.  Such  a  wave  is  called  nondispersive ,  because  points  of  constant 
phase  on  the  components  do  not  separate  over  time. 


What  if  the  phase  velocities  are  different,  i.e.,  if  (v^,)1  7^  (v^)2?  In  this  case,  points 
of  constant  phase  on  the  two  waves  will  move  at  different  velocities,  and  therefore 
the  distance  between  points  of  constant  phase  will  change  as  a  function  of  position 
or  time.  Therefore  the  shape  of  the  superposition  wave  will  change  as  a  function  of 
time;  these  waves  are  dispersive. 


Note  that  the  dispersion  is  a  characteristic  of  the  medium  within  which  the  waves 
travel,  and  not  of  the  waves  themselves.  It  is  the  medium  that  determines  the  ve¬ 
locities  and  thus  whether  the  waves  travel  together  or  if  they  disperse  with  time  and 
space. 


4.7  Average  Velocity  and  Modulation  (Group)  Ve¬ 
locity 


We  added  two  traveling  waves  of  different  frequencies  and  obtained  the  same  result 
we  saw  when  adding  two  oscillations:  the  sum  of  two  harmonic  waves  yields  the 
product  of  two  harmonic  waves  with  modulation  and  average  spatial  and  temporal 
frequencies.  Using  the  new  terms:  kavg,  kmoc\ ,  ujavg .  and  cumod,  we  can  define  the  phase 
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velocities  of  the  average  and  modulation  waves: 

^avg  _  (aJl^2)  _  CUi  +  CU2 
kavg  fci  +  &2 

Wmod_  _  ^1  ~^2 

&mod  (^a)  h  -  k2 

These  two  velocities  have  the  same  meaning  as  the  phase  velocity  of  the  single  wave, 
i.e.,  it  is  the  velocity  of  a  point  of  constant  phase  of  the  average  traveling  wave 
frequency  or  of  the  modulation  wave  frequency,  or  beats  wave.  The  modulation 
velocity  is  also  commonly  called  the  group  velocity. 


Vavg  — 


Vmod  = 


4.7.1  Example:  Nondispersive  Waves  (v^)1  =  (v 

In  a  nondispersive  medium,  the  phase  velocity  is  constant  over  frequency  (or  wave¬ 
length),  i.e., 

(V*),  =  Ti=  (v*)2  =  y 

Note  that  op  ^  oj2  and  k\  k2  -  only  the  ratios  are  equal.  Now  find  expressions  for 

Vmod  and  Vavg- 


UJ, 

Vavg  =  ^ 


avg 


UJ1+UJ2  ,  ,  , 

2  UJ\  "T  0J2 


avg 


ki+k2 

2 


_  (!  +  g) 

h  +  k2  ki  ^  +  gj 


Since  y1  =  y2-  for  nondispersive  waves  =>•  —  =  ¥■  and: 

b.-t  bo  ±  (o-i  b-i 


ki  &2 


uji  ki 


V  avg 


hi 

ki 


kil  +  ^  k\ 


Vi  =  v2  =  V, 


avg 


Similarly  for  the  velocity  of  the  modulation  wave: 


Vmod 


^mod 


''mod 


^1"^2  ,  ,  ,  , 

2  UJ\  UJ2 

ki—k2 
2 


_  (!  ~  jg) 

*i  (l  ~  g) 


Since  ^  =  w-for  nondispersive  waves,  then  —  =  y  and: 


fci  k2 


uj  i  k\ 


Vmod 


1  k2 

U 1  1  ~  fcf  =  Wl 

l  -  |  “ 


Vl  =  V2  =  Vmod  =  V, 


avg 


Note  also  that  cumod  = 


_  Ld\—  UJ2  _  A UJ 


ki  —  k2 


A  k 


dw  _ 

dk  "mod 


In  a  nondispersive  medium,  all  waves  (all  spatial  and  temporal  frequencies  and  all 
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modulation  and  average  waves)  travel  at  the  same  velocity. 


4.8  Dispersion  Relation  for  Nondispersive  Travel¬ 
ing  Waves 

Waves  are  nondispersive  in  some  important  physical  cases:  e.g.,  light  propagation  in 
a  vacuum  and  audible  sound  in  air.  Since  f  =v<j>,  we  can  easily  express  the  temporal 
angular  frequency  c o  in  terms  of  the  angular  wavenumber  k: 

uj  =  uj{k )  =  (v^)  •  k  where  v^  is  constant,  so  that  to  oc  k. 

The  expression  of  uj  in  terms  of  k  is  called  a  dispersion  relation.  We  can  plot  uj  [k]  vs. 
k,  giving  a  straight  line  in  the  nondispersive  case. 


Dispersion  Relation  for  Nondispersive  Waves,  Two  types  of  wave  with  different 

velocities  {v(f)1  >  ( v ^)2. 


4.9  Dispersive  Traveling  Waves 

The  more  general,  more  common,  and  more  important  case  is  that  of  dispersive  waves. 
Here,  the  phase  velocity  v^  =  f-  is  not  constant;  v4l  varies  with  frequency.  This  is  the 
normal  state  of  affairs  for  light  traveling  in  a  medium  such  as  glass.  The  common 
specification  of  the  phase  velocity  of  light  in  medium  is  the  refractive  index  n: 

c 

n  =  — 

W 
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where  v<P  is  the  phase  velocity  of  light  in  the  medium.  In  a  dispersive  medium,  we 
can  interpret  group  velocity  in  another  way: 


u(k)  =  k  ■  v<£ 


doj  cl  . 
Vmod  =  dk=dk{k 


=  v<» 


In  other  words,  the  group  velocity  is  the  sum  of  the  phase  velocity  v4,  and  a  term 


proportional  to  '-rr-  which  is  the  change  in  phase  velocity  with  wavenumber: 


dv<f, 

dk 

dv,/, 

dk 


>  0 


<  0 


^mod  ^ 
Vmod  ^  V0. 


As  the  phase  velocity  varies,  the  refractive  index  varies  inversely  (faster  velocity  => 
smaller  index).  Variation  of  the  refractive  index  implies  a  change  in  the  refractive 
angle  of  light  entering  or  exiting  the  medium  (via  Snell’s  law).  Variation  of  refractive 
index  with  wavelength  implies  that  different  frequencies  will  refract  at  different  angles. 
This  is  the  principle  of  the  dispersing  prism. 


4.9.1  Example:  Dispersive  Traveling  Waves 


Consider  a  medium  with  dispersion  relation  of  the  form  of  a  power  law: 

u{k)  =  akl 


where  i  is  a  real  number.  The  average  and  modulation  velocities  are: 


OJ 

Vav9  =  £  = 


a(ke 


=  ak 


t- 1 


k 

Vmod  =  ^  =  j:  (ak£)  =  Z  {ak^)  =  Z  '  Vavg- 


So  if  i  >  1,  then  vmocj  >vm,g,  and  if  £  <  1,  vmod  <vavg ■  The  first  relation  corresponds  to 
anomalous  dispersion  and  the  second  to  normal  dispersion.  The  dispersion  relation  for 
normal  dispersion  is  nonlinear  and  concave  down,  while  that  for  anomalous  dispersion 
is  nonlinear  and  concave  up.  Of  course,  for  nondispersive  waves  the  dispersion  relation 
is  linear. 
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CO 


Dispersion  Relation 
for  Normal  Dispersion 


for  Anomalous  Dispersion 


«(k) 


Phase  and  modulation  (group)  velocities  on  the  dispersion  plot  cu  [k].  The  phase 
velocity  at  wavenumber  k\  is  ff,  while  the  velocity  of  the  modulation  wave  is  the 
slope  of  the  dispersion  curve  evaluated  at  k\, 


duj  I 


vmod  ~  dk  \  k=kl 


In  a  medium  with  normal  dispersion,  the  refractive  index  n  increases  with  frequency 
v  (or  cu)  and  decreases  with  wavelength  A.  Therefore  n  increases  as  the  wavenumber 
k  increases ,  i.e.,  >  0.  Thus  in  real  media,  the  average  waves  travel  faster  than  the 

modulation  wave.  This  also  means  that  the  signal  impressed  on  an  electromagnetic 
wave  cannot  travel  faster  than  the  speed  of  light. 
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Refractive  index  n  vs.  wavelength  A  for  several  media,  demonstrating  the  decrease  in 
index  (and  thus  increase  in  phase  velocity)  of  light  with  increasing  wavelength. 


4.9.2  Propagation  of  Superposition  of  Waves  in  Nondisper- 
sive  and  Dispersive  Media 


Recall  that  an  anharmonic,  though  periodic,  oscillation  can  be  expressed  as  a  sum  of 
harmonic  terms  of  different  frequencies,  i.e.  ,  as  a  Fourier  series.  We  can  therefore  find 
the  effect  of  dispersion  on  an  anharmonic  traveling  wave  by  decomposing  it  into  its 
Fourier  series  of  harmonic  terms  and  propagating  each  separately  at  its  own  velocity. 
The  resultant  is  found  by  resumming  the  resulting  components.  For  example,  if: 


/  [z,  t)  =  S 1  [z,  t]  +  s2  [z,  t]  +  s3  [z,  t] 

Ai  Ai 

=  A\  sin  [k\Z  —  oqt]  +  —sin  [k2z  —  3uR]  +  —sin  {k$z  —  hojit] 

o  o 


=  Ai  cos 


7T 


k\Z  -u\t-  - 


A\  . 

— Sill 

3 


7T 


k2z  -  3c oR  -  — 

Zj 


A\  . 

— Sill 

5 


7 r 


k3z  -  5cuR  - 


As  we’ve  already  seen,  /  [ z ,  t]  is  the  sum  of  the  first  three  terms  of  a  square  wave. 
The  wave  at  the  source  is  a  “blurry  square  wave,”  as  shown  in  (a),  where  the  three 
wavelengths  of  the  three  waves  are  respectively  4  units,  |  units,  and  |  units.  In  the 
nondispersive  case,  Ai  =  3A2  =  5A3  and  £q  =  y  =  y,  which  means  in  turn  that 
ff  =  =  if  and  vi  =v2  =v3.  Since  all  components  in  the  waveform  propagate 

at  the  same  velocity,  then  the  relative  phase  difference  is  maintained  throughout  and 
the  “shape”  of  the  wavefront  doesn’t  change  as  it  propagates  as  shown  in  (b): 


4.9  DISPERSIVE  TRAVELING  WAVES 
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Propagation  of  a  waveform  in  a  nondispersive  medium:  (a)  sum  of  three  sinusoids 
to  produce  a  “ blurry ”  square  wave  (lowpass  filtered);  (b)  resulting  waveform  after 
propagating  all  three  sinusoids  with  the  same  phase  velocity,  so  that  all  components 
travel  the  same  distance.  The  waveform  is  undistorted. 


In  dispersive  media,  energy  conservation  requires  that  the  temporal  frequencies 
are  unchanged  (uq  =  ^  =  ff  ).  However,  the  phase  velocities  are  no  longer  equal 
(vi  f1  v2  t^v3  7^  Vi)  and  thus  the  wavelengths  are  no  longer  proportional  (in  other 
words,  it  is  the  oscillation  frequency  and  not  the  wavelength  that  determines  the 
“color”  of  the  light).  As  the  waves  travel  through  the  media,  their  relative  phases 
will  vary,  and  the  “shape”  of  the  waveform  will  become  increasingly  distorted.  If  the 
dispersion  is  normal  in  the  medium  at  these  frequencies,  then  the  lower-frequency 
sinusoid  (e.g.,  si  [z,t\  in  this  example)  travels  faster  than  a  high-frequency  sinusoid, 
so  that  vi  >  v2  >  v3  in  this  example.  Consider  the  resulting  waveforms  if  the  low- 
frequency  component  has  moved  1  unit  and  2  units  from  the  case  shown  in  (a)  above: 


X 


X 


Propagation  of  a  waveform  in  a  medium  with  normal  dispersion.  The  same 
waveform  above  is  assumed:  (a)  after  the  low-frequency  term  has  propagated  by  one 
unit  (the  higher- frequency  terms  have  moved  shorter  distances);  (b)  after 
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low-frequency  term  has  propagated  two  units,  showing  that  the  distortion  in  the 

waveform  has  increased. 

Of  course,  the  behaviour  of  the  individual  components  in  anomalous  dispersion  is 
complementary;  the  high-frequency  sinusoidal  terms  move  faster  (vi  <  v2  <  v3);  the 
distortion  still  exists,  but  it  is  in  some  sense  “reversed.” 


Propagation  of  a  waveform  in  a  medium  with  anomalous  dispersion  assuming  same 
“blurry”  square  wave  used  previously:  (a)  after  the  low-frequency  term  has 
propagated  by  one  unit  and  the  higher- frequency  terms  longer  distances;  (b)  after 
low-frequency  term  has  propagated  two  units. 


4.9.3  Energy  and  Information  Transmission  in  Nondispersive 
and  Dispersive  Media 

The  issue  of  differential  wave  velocities  also  is  relevant  to  the  propagation  of  energy, 
information,  and  “messages.”  This  concept  is  interesting  in  its  own  right,  and  also 
potentially  confusing,  so  we’ll  discuss  it  (albeit  briefly).  A  good  source  on  the  subject 
is  Chapter  6  of  Waves,  by  Crawford.  In  amplitude  modulation  (e.g.,  AM  radio),  the 
information  (speech  or  music,  call  it  s  [i])  multiplies  (“modulates”)  a  high-frequency 
carrier  wave  r  [t] : 


f[t]=S  [t]  ■  r  [t]  =  S  [t]  •  COS  [iV carrier  •  t] 

S  [t]  COS  carrier  '  f] 

The  FCC  decrees  that  the  frequency  of  the  carrier  wave  ( ^carrier )  lies  in  the  range 
500  kHz  <  z/0  <  1600  kHz,  while  the  audio  frequencies  in  s  [t]  are  much  lower  (20  Hz 
v audio  ~  20  kHz).  This  signal  radiates  as  a  traveling  wave  either  through  the  nondis¬ 
persive  vacuum  of  space  or  a  normally  dispersive  medium  of  air  (though  the  dispersion 
is  small).  Because  the  carrier  frequency  is  so  much  larger  than  the  signal  frequency, 
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the  velocities  of  the  average  and  modulation  waves  are: 

^ carrier 

Jc 

carrier 

k 

"'carrier 

The  information  (speech  or  music)  is  carried  by  the  modulation  and  travels  through 
the  medium  at  the  modulation  velocity,  which  we  know  to  be  less  than  the  average 
velocity  in  a  normally  dispersive  medium.  An  example  is  shown  as  “snapshots”  of 
the  product  of  a  long-period  sinusoidal  modulation  with  frequency  zq  and  period  Ai 
(taken  to  be  |  units  in  this  example)  and  a  short-period  carrier  wave  with  frequency 
iS‘2  and  A2  (=  §  units).  The  phase  velocity  of  the  higher- frequency  is  assumed  to  be 
y|  =  93%  of  the  velocity  of  the  lower- frequency  wave).  The  periods  of  the  average 
and  modulation  waves  are: 


V average  V average  ‘  A average 


,  du 

_  \  r\j 

Vmod  ^mod  *  /'mod  — 


Kvg  =  2  =  1  unit 

Ai  +  A2 


Amorl  —  2 


Ai  •  A2 
|Ai  —  A2 


=  8  units 


T 

The  snapshots  are  taken  at  increments  of  T  =  so  that  the  average  wave  propa¬ 
gates  by  one-quarter  period  between  images.  A  point  of  constant  phase  on  the  average 
wave  is  denoted  in  each  image  by  the  black  dot,  whihc  is  seen  to  travel  faster  than  a 
point  of  constant  phase  on  the  modulation  wave;  in  this  case,  the  ratio  of  modulation 
velocity  to  average  velocity  is  approximately: 


vmod 
V avg 


=  0.69 


Thus  the  “information”  travels  about  70%  as  fast  as  the  average  wave. 
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Illustration  of  normal  dispersion  of  two  waves  shown  at  increments  of  At  = 
The  black  dot  marks  a  point  on  the  average  wave  with  the  same  phase,  which 
faster  than  the  corresponding  point  on  the  modulation  wave. 


Tgvg 

4  ' 

moves 


Chapter  5 


Review:  Doppler  Effect 


5.1  Transition  from  Acoustic  Waves  to  Electro¬ 
magnetic  Waves 

HR  §  40 

The  change  in  the  frequency  of  a  sound  wave  due  to  relative  motion  of  the  source 
and/or  receiver  is  very  familiar  -  the  increase  in  pitch  of  an  approaching  or  receding 
locomotive  airhorn  is  a  common  example.  This  effect  was  described  mathematically 
by  Christian  Doppler  in  1842,  and  is  naively  understood  by  many  people.  However, 
few  realize  the  fundamental  difference  between  the  Doppler  effect  due  to  source  motion 
and  that  due  to  receiver  motion. 


5.1.1  Acoustic  Doppler  Effect,  Source  at  Rest 


CASE  I  Motion  of  Receiver,  Source  and  Medium  at  Rest 

Consider  a  point  source  of  sound  in  air  which  emits  a  frequency  u.  The  receiver 
moves  relative  to  the  source  at  velocity  v0.  Since  the  source  and  medium  are  at  rest, 
the  sound  has  a  wavelength  X  =  f  ,  where  v  is  the  velocity  of  sound  in  air  (~  330^  at 
STP).  Since  the  source  is  at  rest,  the  wavefronts  expand  uniformly  from  the  source. 
A  receiver  traveling  toward  (away  from)  the  source  passes  more  (fewer)  peaks  of  the 
sound  wave  in  a  given  time  interval  than  (s)he  would  were  (s)he  stationary.  Therefore, 
the  receiver  hears  a  higher  (lower)  pitch. 
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A  “snapshot”  of  the  source,  receiver,  and  traveling  wave  if  the  observer  moves 

towards  the  source. 


This  is  shown  in  a  snapshot  of  the  source,  receiver,  and  the  emerging  wavefronts 
(i.e.,  a  wavefront  is  the  locus  of  points  of  constant  phase  on  a  wave).  The  number  of 
wave  peaks  heard  per  unit  time  is  the  observed  frequency  //,  and  equals  the  source 
frequency  plus  (minus)  the  number  of  extra  cycles  heard  due  to  observer  motion: 


l/  =  U±Al/  =  V±-£  = 

=  I/fl±^W 

V 


The  +  sign  means  that  the  receiver  approaches  the  source. 

Example:  v  =  1000  Hz,v0  =  60  rnph  =  88  fps  =  26.8  ™  toward  source 

26.8' 


i/  =  1000  Hz  • 


1 


330 


~  1000  Hz  T. 081  =  1081Hz  >  1000  Hz 


5.1.2  Acoustic  Doppler  Effect  —  Source  in  Motion 


CASE  II  Source  Moves  in  Medium,  Receiver  Stationary 
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V 


► 


Dopper  effect  for  sound  waves  if  the  source  moves  towards  the  observer.  The  circles 
represent  wavefronts  emitted  by  the  source  at  different  times,  showing  that  they  do 
not  have  a  common  center  of  symmetry. 

Again,  this  is  a  snapshot  of  the  wavefronts  emitted  by  a  source  moving  toward  the 
receiver  with  velocity  v.s.  The  wavefronts  emitted  at  later  times  have  less  distance  to 
travel  to  the  observer.  The  distance  between  adjacent  wavefronts  in  the  medium  is 
actually  shortened  on  one  side  and  lengthened  on  the  other,  i.e., 

A'  =  A  =F  AA  =  A  T  — ,  where  the  negative  sign  ==>  source  approaching  observer, 
v 

Therefore: 


,  V  V  V 

l/>  X'  X  in  Xa  Xv^Vs 

1  V  V 

where  v  is  the  velocity  of  the  wave  in  the  medium.  For  example  with  v  =  1000  Hz, 
vs  =  60  mph  =  26,^m  toward  observer 


v=F  vs 


=  v 


v 


/ 


1000  Hz- 


330— 

_ S _ 

(330™  —  26.8™) 


“  1000  Hz • 


330 

303.2 


1088  Hz  >  1000  Hz 


In  the  case  of  the  source  moving  in  the  medium,  the  frequency  is  significantly  different 
than  for  the  case  of  the  observer  moving  (1088  Hz  vs.  1081Hz);  this  difference  can 
be  detected  to  determine  whether  the  observer  or  the  source  is  moving  relative  to  the 
medium. 


5.1.3  Acoustic  Doppler  Effect  —  Both  Source  and  Receiver 
Moving 

Case  III  Both  Source  and  Receiver  Moving,  Medium  at  Rest 
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If  both  source  and  receiver  move  relative  to  the  medium,  the  frequency  is  a  com¬ 
bination  of  the  two  results: 


V  =  V  ■ 


v  =1=  Vo 
V=F  Vs 


upper  signs  =>•  source  and  receiver  approach  each  other 


5.2  Doppler  Effect  for  Light 

5.2.1  Difference  between  Light  and  Sound 

Because  the  Doppler  effect  for  sound  differs  if  the  source  moves  rather  than  the 
observer,  it  is  possible  to  determine  which  is  moving  relative  to  the  medium.  If  the 
observer  moves,  the  wavelength  A  in  the  medium  is  invariant  and  the  change  in  pitch 
is  due  to  the  more-or-less  frequent  passages  of  the  wavefronts  by  the  observer.  If  the 
source  moves,  the  wavelength  of  the  sound  in  the  medium  changes  and  the  sign  of 
the  change  depends  on  the  direction  of  source  motion.  If  this  new  wavelength  is  A', 
then  the  new  frequency  is  v'  = 

For  light  waves  (electromagnetic  radiation),  the  mechanism  of  wave  propagation 
(and  hence  of  the  Doppler  effect)  is  fundamentally  different  from  propagation  of  sound 
in  air.  Because  of  this  big  difference,  light  propagation  was  not  successfully  described 
until  1864,  when  James  Clerk  Maxwell  collected  and  interpreted  the  four  equations 
which  bear  his  name.  The  true  nature  of  light  was  not  generally  accepted  until  post- 
1880.  Why  is  light  so  different? 

Recall  that  two  forces  are  required  to  sustain  oscillations  or  propagate  waves  - 
(1)  inertia;  (2)  restoring  force.  Waves  in  common  everyday  experience  (e.g.,  sound 
in  air,  surface  waves  in  water),  inertia  is  supplied  by  the  source  (air  motion  from  the 
diaphragm,  physical  displacement  of  the  water  surface).  The  restoring  force  is  due  to 
a  characteristic  of  the  medium  of  transmission  (e.g.,  air  pressure,  gravity  plus  surface 
tension). 

By  the  early  1800’s,  some  characteristics  of  light  were  already  known,  e.g.,  the 
phase  velocity  c  was  known  to  be  finite.  The  first  recorded  experiment  to  measure  c 
was  performed  by  Galileo  around  1600.  He  stationed  a  man  with  a  shuttered  lantern 
on  a  distant  hill  with  instructions  to  open  the  shutter  as  soon  as  he  saw  the  light  from 
Galileo’s  lamp.  By  timing  the  interval  between  unshuttering  his  lamp  and  seeing  the 
return  beam,  Galileo  tried  to  measure  c  via  c  =  where  L  is  the  distance  between 
lanterns.  His  conclusion: 

“If  not  instantaneous,  light  is  extraordinarily  rapid.  ” 

A  surprisingly  good  measurement  of  c  was  made  by  Ole  Romer  in  1675.  The 
Keplerian  laws  of  planetary  motion  enabled  Romer  to  predict  the  times  of  eclipse  of 
Jupiter’s  Galilean  satellites.  He  found  that  the  measured  times  did  not  agree  with 
prediction  -  when  Jupiter  was  closest  to  earth,  the  times  of  eclipse  were  early,  and 
when  Jupiter  was  distant  the  times  were  late.  Romer  ascribed  the  difference  to  a 
finite  velocity  of  light,  and  computed  a  value  of  c  =  2  •  108™.  The  largest  source 
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of  error  was  Romer’s  lack  of  knowledge  about  the  earth’s  orbital  velocity.  When 
corrected  for  this  error,  Romer’s  method  yields  a  very  accurate  value  of  3  •  108  — . 
Besides  its  velocity,  light  had  been  demonstrated  to  have  the  character  of  a  wave  by 
Newton’s  demonstration  of  dispersion  by  a  prism  and  by  the  polarization  experiments 
of  Fresnel.  These  characteristics  led  to  Fresnel’s  hypothesis  of  the  “aether”  -  the 
medium  of  transmission  for  light,  which  is  analogous  to  air  for  propagation  of  sound. 
If  it  exists,  the  aether  must  be  present  everywhere,  including  in  vacuum.  From  the 
calculations  of  the  Doppler  effect,  the  frequency  shift  of  light  must  depend  on  whether 
the  source  or  the  observer  is  moving. 

The  need  for  the  aether  was  eliminated  by  Maxwell  (as  we  shall  soon  see),  and  its 
existence  was  disproved  by  Michelson  and  Morley  in  1880  when  they  demonstrated 
that  the  velocity  of  light  is  identical  parallel  to  or  perpendicular  to  the  orbital  mo¬ 
tion  of  the  earth,  which  would  not  have  been  true  had  an  aether  been  necessary  for 
propagation. . 

Einstein  used  Michelson’s  results  to  derive  the  Special  Theory  of  Relativity,  which 
states: 

“ The  velocity  of  light  is  constant,  regardless  of  the  motion  of  the  source  or  the 

observer. 

In  addition,  there  is  no  preferred  frame  of  reference.  ” 

Therefore  when  considering  light,  the  Doppler  effect  should  yield  identical  results 
if  the  source  is  moving  or  if  the  observer  is  moving.  In  fact,  it  is  impossible  to  define 
which  moves;  only  the  relative  motion  is  meaningful. 

Einstein’s  result  is: 


v  =  v 


1  -  2 

C 


'  y_\2 


v  1- 


C/J 


i-RY 

cJ 


where  the  square  root  may  be  approximated  via  applying  the  well-known  power  series: 

,  1  „  . 


.  1  n  n(n—  1)  9  n  (n  —  1)  in  —  2)  , 

(1  +  M)  =0!+l!“+-^!  “  - St - “ 

In  this  case,  the  series  solution  is: 
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In  the  case  of  light  its  velocity  c  »  v,  we  can  dispense  with  the  terms  with  orders 
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larger  than  unity: 

v'  ~  v  ( 1 - )  =  v  —  v—  for  v  <<  c. 

V  c)  c 

Thus  if  the  relative  velocity  of  source  and  observer  is  positive  (so  that  the  distance 
increases),  then  the  Doppler  shift-  decreases  the  frequency  (and  increases  the  wave¬ 
length)  by  an  amount  that  is  proportional  to  v.  This  is  the  famous  “red  shift”  in 
astronomy.  Obviously  the  complementary  solution  also  applies  if  the  source  and  ob¬ 
server  approach  each  other  to  produce  a  “blue  shift.” 


Chapter  6 


Max well’s  Equations  for 
Electromagnetic  Waves 

6.1  Vector  Operations 

Any  physical  or  mathematical  quantity  whose  amplitude  may  be  decomposed  into 
“directional”  components  often  is  represented  conveniently  as  a  vector.  In  this  dis¬ 
cussion,  vectors  are  denoted  by  bold-faced  underscored  lower-case  letters,  e.g.,  x-  The 
usual  notation  for  a  vector  with  N  elements  is  a  column  of  N  individual  numerical 
scalars,  where  N  is  the  dimensionality  of  the  vector.  For  example,  the  3-D  vector  x 
is  specified  by  a  vertical  column  of  the  three  ordered  numerical  components: 


Xi 


X  = 


X2 


X3 


Both  real-  and  complex-valued  scalars  will  be  used  as  the  components  xn  with  the 
same  notation.  If  the  xn  are  real,  then  the  vector  x  specifies  a  location  in  3-D 
Cartesian  space.  The  individual  scalar  components  aq,  aq,  and  x3  are  equivalent  to  the 
distances  along  the  three  axial  directions  (commonly  labeled  x,  y.  and  2,  respectively, 
in  the  space  domain).  In  common  situations,  the  components  of  the  vector  x  have 
dimensions  of  length,  but  other  representations  are  possible.  For  example,  we  shall 
often  use  a  convenient  representation  of  a  sinusoid  in  the  x  —  y  plane  that  is  specified 
by  a  vector  whose  components  have  the  dimensions  of  spatial  frequency  (e.g.,  cycles 
per  mm). 

To  minimize  any  confusion  resulting  from  the  use  of  the  symbol  ux”  to  represent 
both  a  vector  and  a  particular  component  of  a  vector,  a  normal-faced  “x”  with  a 
subscript  will  be  used  to  indicate  the  ith  component  of  the  vector  x,  while  the  bold¬ 
faced  subscripted  symbol  “x,”  denotes  the  ith  member  of  a  set  of  vectors.  Other 
notations  also  will  be  employed  during  certain  aspects  of  the  discussion,  but  these 
cases  will  be  explicitly  noted. 
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Definitions  of  the  algebraic  operations  of  vectors  will  be  essential  to  this  discussion. 
For  example,  the  sum  of  two  N-D  vectors  x  and  y  is  generated  by  summing  the  pairs 
of  corresponding  components: 


X\ 

2/i 

xi  +2/1 

+ 

2/2 

= 

+2+2/2 

XN 

2 In 

xN  +  VN 

The  notation  “x”  and  “y”  used  here  merely  distinguish  between  the  two  vectors 
and  their  components;  they  are  not  references  to  the  x-  and  ^-coordinates  of  2-D  or 
3-D  space.  Note  that  this  definition  implies  that  two  vectors  must  have  the  same 
dimension  for  their  sum  to  exist. 

The  definition  of  the  difference  of  two  vectors  is  evident  from  the  equation  for  the 
sum: 


Xi 

2/i 

Xi  -  2/1 

%2 

— 

2/2 

= 

%2  ~  2/2 

XN 

2 In 

%n  —  2 In 

Obviously,  if  the  number  of  dimensions  N  of  the  vector  is  1,  2,  or  3,  then  the 
corresponding  vector  x  specifies  a  location  on  a  line,  on  a  plane,  or  within  a  volume, 
respectively.  This  interpretation  of  a  vector  as  the  location  of  a  point  in  space  is  so 
pervasive  and  intuitive  that  it  may  obscure  other  useful  and  perhaps  more  general 
interpretations  of  vectors  and  vector  components.  For  example,  we  can  use  the  vector 
notation  to  represent  a  two-dimensional  (2-D)  sampled  object.  Such  an  object  formed 
from  an  NxN  array  of  samples  or  by  “stacking”  the  N  columns  to  create  a  1-D  vector 
with  N2  components.  This  stacking  process  is  known  as  lexicographic  ordering  of  the 
matrix.  Such  a  representation  often  is  used  when  constructing  computer  algorithms 
for  processing  digital  images,  but  will  not  be  considered  further  here. 

The  transpose  of  the  column  vector  x  is  the  same  set  of  scalar  components  arrayed 
as  a  horizontal  row,  and  is  denoted  in  this  discussion  by  a  superscript  T;  another 
common  notation  uses  an  overscored  tilde: 


T 

X  = 

Xi 

X2 

x3 

By  analogy  with  the  usual  interpretation  of  a  vector  in  Cartesian  space,  the  length  of 
a  vector  with  real-valued  components  is  a  real- valued  scalar  computed  from  the  2-D 
or  3-D  “Pythagorean”  sum  of  the  components: 


N 

=  ix|2 

n=  1 
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The  result  is  the  squared  magnitude  of  the  vector.  The  vector’s  length,  or  norm ,  is 
the  square  root  of  Eq.(3.5),  as  shown  in  the  figure  and  thus  also  is  real  valued. 


Length,  or  “norm” ,  of  2-D  vector  with  real- valued  components. 


x 


N 


\ 


^Xnf 

n= 1 


From  this  definition,  it  is  evident  that  the  norm  of  a  vector  must  be  nonnegative 
(|x|  >0)  and  that  it  is  zero  only  if  all  scalar  components  of  the  vector  are  zero. 

Vectors  with  unit  length  will  be  essential  in  the  discussion  of  transformations  into 
alternate  representations.  Such  a  unit  vector  often  is  indicated  by  an  overscored 
caret.  The  unit  vector  pointing  in  the  direction  of  any  vector  x  may  be  generated  by 
dividing  each  component  of  x  by  the  scalar  length  |x|  of  the  vector: 


The  squared- magnitude  operation  is  the  first  example  of  the  vector  scalar  product 
(also  called  the  dot  product ),  which  defines  a  “product”  of  two  vectors  of  the  same 
dimension  that  generates  a  scalar.  Following  common  mathematical  notation,  the 
scalar-product  operation  will  be  denoted  by  a  “dot”  (•)  between  the  symbols  for  the 
vectors.  The  process  also  may  be  written  as  the  transpose  of  x  multiplied  from  the 
right  by  x.  Therefore,  the  scalar  product  of  a  vector  x  with  itself  may  be  written  in 
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equivalent  ways. 


N 

x|2  =  (x*x)  =  XTX  = 

n=  1 


6.1.1  Scalar  Product  of  Two  Vectors 

It  is  easy  to  generalize  the  squared  magnitude  operation  to  apply  to  distinct  vectors 
a  and  x  that  have  real- valued  components  and  that  have  the  same  dimension  N : 


Xi 

%2 

XN 

T 

a  •  x  =  a  x  = 

Ci\ 

02 

on 

N 

=  Cl\X i  +  0,2X2  +  '  '  '  +  OnXn  =  'y  ^  cinxn 
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In  words,  the  scalar  product  of  two  vectors  is  obtained  by  multiplying  pairs  of  vector 
components  with  the  same  indices  and  summing  these  products.  Note  that  the  scalar 
product  of  two  distinct  vectors  may  be  positive,  negative,  or  zero,  whereas  that  the 
squared  magnitude  of  a  vector  must  be  nonnegative.  From  these  equivalent  mathe¬ 
matical  expressions,  it  is  apparent  that  the  scalar  product  of  vectors  with  real-valued 
components  in  either  order  are  identical: 


a  •  x  =  x  •  a 

Any  process  that  performs  an  action  between  two  entities  and  that  may  be  performed 
in  either  order  is  commutative.  The  simple  concept  of  the  scalar  product  is  the  basis 
(future  pun  intended)  for  some  very  powerful  tools  for  describing  vectors  and,  after 
appropriate  generalization,  for  functions  of  continuous  variables.  The  features  of  the 
various  forms  of  scalar  product  are  the  subject  of  much  of  the  remainder  of  this 
chapter. 

The  scalar  product  of  an  arbitrary  “input”  vector  x  with  a  “reference”  vector 
a  has  the  form  of  an  operator  acting  on  x  to  produce  a  scalar  g:  The  appropriate 
process  was  just  defined: 


N 

o  {x}  =  a  •  X  =  ^2  On  Xn  =  g 

n= 1 


It  is  apparent  that  a  multiplicative  scale  factor  k  applied  to  each  component  of  the 
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real-valued  input  vector  x  results  in  the  same  scaling  of  the  output  scalar: 


N 


N 


O  {k  x}  =  an  (k  xn)  =  k  ^  an  xn  =  k  g 


n= 1 


n=  1 


which  demonstrates  that  the  scalar  product  “operator”  satisfies  the  linearity  condi¬ 
tion. 

The  geometrical  interpretation  of  a  2-D  vector  as  the  endpoint  of  a  line  drawn  from 
the  origin  on  the  2-D  plane  leads  to  an  alternate  expression  for  the  scalar  product 
of  two  vectors.  It  is  convenient  to  use  2-D  vectors  denoted  by  fn  with  Cartesian 
components  [xn,yn\,  or  represented  in  polar  coordinates  by  the  length  fj  and  the 
azimuth  angles  0n.  The  geometric  picture  of  the  vector  establishes  the  relationship 
between  the  polar  and  Cartesian  representations  to  be: 


L  = 


Xn,  Vn]  =  [|£J  COS  [i 9n ]  ,  |fj  sill  [9n] 


where,  in  this  case,  xn  and  yn  represent  x-  and  //-coordinates  of  the  vector  fn.  The 
scalar  product  of  two  such  vectors  f ,  and  f2  is  obtained  by  applying  the  definition 
and  casting  into  a  different  form  by  using  the  well-known  trigonometric  identity  for 
the  cosine  of  the  difference  of  two  angles: 

£ i  •  f2  =  XiX2  +  ?/i2/2 


=  (Ifll 

cos 

m  (ifi 

cos  [02])  +  (Ifi 

|  sin  [0X])  (|f2|  sin  [02] ) 

=  Ifll 

lf2l 

(cos  [0i] 

cos  [02]  +  sin  [0i 

1  sin  [02]) 

=  Ifll 

lf2l 

cos  [9\  - 

02]  =  | f !  |  |f2| 

COS  0i  —  02 

where  the  symmetry  of  the  cosine  function  has  been  used  in  the  last  step.  In  words, 
the  scalar  product  of  two  2-D  vectors  is  equal  to  the  product  of  the  lengths  of  the 
vectors  and  the  cosine  of  the  included  angle  9 \  —  6b-  The  knowledgeable  reader  is 
aware  that  this  result  has  been  obtained  by  circular  reasoning;  we  are  defining  the 
scalar  product  form  by  using  the  Cartesian  components  of  polar  vectors,  which  were 
themselves  determined  by  scalar  products  with  the  Cartesian  basis  vectors.  This 
quandary  is  due  in  part  to  the  familiarity  of  these  concepts.  Rather  than  resolve  the 
issue  from  first  principles,  we  will  instead  “sweep  it  under  the  rug”  while  continuing 
to  use  our  existing  intuition  as  a  springboard  to  generalize  these  concepts  to  other 
applications.  For  example,  it  is  easy  now  to  generalize  the  scalar  product  to  real¬ 
valued  vectors  a  and  x  with  arbitrary  dimension  N: 

a«x=|a|  |x|  cos  [9a  —  9X]  =  |a|  |x|  cos  [0] 

where  9  represents  the  “included”  angle  between  the  two  N-D  vectors.  This  angle  is 
measured  in  the  2-D  plane  defined  by  the  two  vectors.  If  we  consider  the  3-D  analogy 
of  two  vectors  from  the  origin  to  the  surface  of  a  sphere,  then  the  angle  9  represents 
the  angle  along  the  “great  circle”  that  connects  the  two  vector  tips. 

This  last  definition  for  the  scalar  product  may  be  used  to  derive  the  Schwarz 
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inequality  for  vectors  by  recognizing  that  cos  [9]  <1: 

a  •  x  <  |a|  |x| 

The  equality  is  satisfied  only  for  vectors  a  and  x  that  “point”  in  the  same  direction, 
which  means  that  the  ratios  of  the  corresponding  components  of  a  and  x  are  equal, 
and  that  the  included  angle  9  =  0  radians,  which  means  that  the  vectors  are  scaled 
replicas.  Note  both  the  similarity  and  difference  between  the  Schwarz  inequality  and 
triangle  inequality  for  vectors: 


|a  +  x|  <  |a|  +  |x| 

In  words,  the  Schwarz  inequality  says  that  the  scalar  product  of  two  vectors  can  be 
no  larger  than  the  product  of  their  lengths,  while  the  triangle  inequality  establishes 
that  one  side  of  a  triangle  can  be  no  longer  than  the  sum  of  the  other  two  sides.  Both 
relations  are  illustrated  in  the  figure. 


Schwarz’  Inequality  Triangle  Inequality 


la  •  x  =  ax 

cos[d] 

a  +  x|  <  a  +  x 

<  ax 

Graphical  comparison  of  Schwarz’  and  the  triangle  inequalities  for  the  same  pair  of 

2-D  vectors  x  and  a. 


The  Schwarz  inequality  may  be  combined  with  the  definition  of  the  unit  vector  to 
obtain  an  expression  for  the  included  angle  between  two  unit  vectors: 

ax 

i — r  •  i — r  =  ft  •  x  =  cos  \9  <  1 

a  x 


6.1.2  Cross  Product 

Consider  the  area  of  the  parallelogram  formed  by  two  vectors  A  and  B,  as  shown: 

The  area  of  |  A|  |B|  sin  [9]  may  be  computed  as  a  3-D  vector  that  points  perpen¬ 
dicular  to  the  two  component  vectors  with  length  equal  to  the  area;  the  calculation 
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A 

Area  =  |A|  |B|  sin[0] 

is  the  “cross  product”  of  the  two  3-D  vectors.  Given  the  two  component  vectors: 

A  =  xAx  +  y  Ay  +  zAz 
B  =  x/i,  +  y  By  +  z  Bz 

the  cross  product  is  defined: 

x  y  z 

AxB  =  det  Ax  Ay  Az 
Bx  By  Bz 

=  X  (AyBz  -  AzBy)  +  y  ( AZBX  -  AXBZ)  +  z  (AxBy  -  AyBx ) 

In  the  example  given,  A=x  |  A| ,  B  =  x  (|B|  cos  [0])  +y  (|B|  sin  [0]),  so  that  Az  =  Bz  = 
0 

x  y  z 

det  |A|  0  0  =z  (|A|  |B|  sin  [0]) 

|B|  cos  [0]  |B|  sin  [9]  0 

It  is  easy  to  see  that: 

Bx  A  =  -AxB 

Note  that  the  cross  product  is  defined  for  3-D  vectors  ONLY  (though  we  can  apply 
the  definition  to  2-D  vectors  by  considering  their  component  in  the  third  direction  to 
be  zero,  see  example  for  curl  that  follows  later). 

6.1.3  Triple  Vector  Product 

The  “triple  vector  product”  is  the  cross  product  of  two  3-D  vectors  (call  them  A  and 
B)  crossed  with  a  third  vector  (C).  The  result  may  be  evaluated  by  straightforward 
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(yet  tedious!)  calculation  and  produces  the  result: 

AxBxC  =  B  (C  •  A)  —  A  (B  •  C) 
=  B(C«  A)  -  A(C.B) 


where  the  fact  that  the  scalar  product  commutes  for  vectors  with  real-valued  com¬ 
ponents  has  been  used.  The  triple  vector  product  yields  the  difference  of  two  scaled 
replicas  of  the  first-  two  vectors,  where  the  scaling  factors  are  the  scalar  products  of 
C  with  A  and  B.  The  “output”  is  a  vector,  as  it  must  be. 

We  will  use  this  expression  for  the  triple  vector  product  to  evaluate  the  “curl  of 
the  curl”  shortly. 


6.2  Vector  Calculus 

Feynman,  Lectures  on  Physics  II,  Chapter  2,3 

In  1864,  James  Clerk  Maxwell  published  a  paper  on  the  dynamics  of  electromag¬ 
netic  fields,  in  which  he  collected  four  previously  described  equations  which  relate 
electric  and  magnetic  forces,  modified  one  (by  adding  a  term  to  remove  an  incon¬ 
sistency),  and  combined  them  to  demonstrate  the  true  nature  of  light  waves.  He 
demonstrated  that  the  amplitudes  of  the  electric  and  magnetic  fields  would  decrease 
as  the  reciprocal  of  the  distance  (rather  than  the  square  of  the  reciprocal  of  the  dis¬ 
tance,  as  is  true  for  static  electric  fields).  In  this  way,  an  electric  current  in  one 
location  has  a  much  larger  effect  on  a  distant  electric  charge  than  a  static  electric 
charge  at  the  same  location  as  the  current. 

The  four  equations  are  now  collected  into  a  group  that  bears  his  name.  To  interpret 
the  four  Maxwell  equations,  we  must  first  understand  some  concepts  of  differential 
vector  calculus ,  which  may  seem  intimidating  but  is  really  just  an  extension  of  normal 
differentiation  applied  to  scalar  and  vector  fields.  For  our  purposes,  a  scalar  field  is  a 
description  of  scalar  values  in  space  (one  or  more  spatial  dimensions).  One  example 
of  a  scalar  field  is  the  temperature  distribution  in  the  air  throughout  the  atmosphere. 
Obviously,  a  single  number  is  assigned  to  each  point  in  the  space.  On  the  other  hand,  a 
vector  field  defines  the  values  of  a  vector  quantity  throughout  a  volume.  For  example, 
the  vector  field  of  wind  velocity  in  the  atmosphere  assigns  a  three-dimensional  vector 
to  each  point  in  space.  Scalar  quantities  are  denoted  by  normal-face  type  and  vectors 
(usually)  by  underscored  bold-face  characters,  e.g.,  f[x,y,z]  and  g  [x,y,z]  describe 
scalar  and  vector  fields,  respectively.  Unit  vectors  (vectors  with  unit  magnitude,  also 
called  unit  length)  are  indicated  by  bold-faced  characters  topped  by  a  caret,  e.g.,  x, 
y,  and  z. 

In  preparation  of  the  discussion  of  vector  calculus,  we’ll  review  a  few  concepts  of 
classical  mechanics.  Consider  a  force  descibed  by  the  vector  F  =  jiFx  +  y Fy  +  z Fz. 
The  force  performs  “work”  if  it  acts  to  create  a  displacement  (described  by  the  vector 

s)- 

F#s  =  W 

If  the  displacement  is  the  differential  element  ds  =  x.dx  +  ydy  +  zdz,  then  the  scalar 
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product  yields  a  differential  element  of  work 

dW  =  F  •  ds 

and  the  work  resulting  by  the  action  of  the  force  from  point  a  to  point  b  is: 

W=  I  F  •  ds 

J  a 

Note  that  no  work  is  performed  if  the  force  acts  at  right  angles  to  the  displacement; 
the  work  is  “positive”  if  the  force  acts  in  the  direction  of  the  displacement  (e.g.,  a 
weight  dropping  in  a  gravitational  field);  the  work  is  “negative”  if  the  force  acts  in 
opposition  to  the  displacement. 

The  work  can  be  evaluated  via: 


W  =  j  F  •  c/s  =  j  (jtFx  +  y Fx  +  zFz)  •  (x.dx  +  ycly  +  z dz) 


=  Fx  dx  +  Fv  cly  +  Fz  dz  =  T  +  c 


where  T  is  the  kinetic  energy  and  c  is  a  constant. 

It  the  vector  force  is  a  function  only  of  the  distance  from  some  reference  point,  it 
may  be  written  in  terms  of  a  scalar  function  of  that  distance,  called  the  3-D  “poten¬ 
tial”  (or  “potential  energy”)  V  that  satisfies  the  conditions: 


We  can  substitute  these 
work: 


Fx 

Fy 


dV 

dx 

dV_ 

dy 


F  =_dV 

z  dz 


differential  expressions  into  the 


integral  equation  for  the 


j'F.  ds  =  j  (xiq.  +  y  Fx  +  z  Fz)  •  (jtdx  +  y  dy  +  z  dz) 


W)  * 

dx  J 


9V\ 


~  \  dz 
dz 


j  dV  =  -V  =  T  +  c 
T  +  V  =  E  =  constant 


The  sum  of  the  potential  and  kinetic  energies  is  the  total  energy,  a  constant  under 
these  conditions  of  a  “conservative  system.” 

For  a  simple  illustration,  consider  the  force  of  gravity  near  the  earth’s  surface;  the 
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vector  force  is: 


F  =  ±FX  +  y  Fx  +  z  Fz 
=  Ox  +  Oy  +  z  (—mg) 


so  that: 


-£  =  o^ 

ox 

Oil 

dV 

—5-  =  -mg 
oz 


V  =  ci 


V  =  c2 


V  =  mg  dz  =  mgz  +  c3 


=>•  V  [a;,  y,  z]  =  mgz  +  (ci  +  C2  +  c3)  =  mgz  +  constant 
E  =  mgz  +  -mu2 

Under  the  conditions  of  a  conservative  force,  we  can  write  differentiate  the  first  two 
expressions  with  respect  to  the  “other”  variable  and  equate  them: 

d_p  _  d_  (_dV\  _  d2V 
dy  '  Dy  l  Ox  J  dxdy 


9  _  9  (_dV\ 

9'x  11  Dx  y  Dy )  c 

.  d_F  = 

Dy  x  Dx  y 


g2v  _  9  F 

DyDx  Dy 


The  same  pattern  of  operations  leads  to  two  other  relations: 

d_F  —  — F 

OZ  OX 

d_F  —  — F 

Dzy  ~  dy  z 

These  three  are  necessary  and  sufficient  conditions  that  a  force  is  conservative. 


We  can  then  write: 


DV  DV  .  dV 


which  can  be  written  in  a  shorthand  form  by  defining  the  first-order  differential  vector 
operator  V  (called  “he/”)  with  three  components: 

999 
fJx  ’  Dy  Dz 
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It  also  may  be  written  in  explicit  vector  form  as: 

_  „  0  „  d  ^  d 

v  =  xtt  +  +  z— 

ox  —ijy  Oz 

where  x,  y,  and  z  are  the  unit  vectors  along  the  x,  y.  and  z  axes  respectively.  Thus 
we  can  write: 

F  =  — W 

It  is  easy  to  show  that  V  satisfies  the  requirements  for  a  linear  operator: 

V  (a  +  b)  =  Vo  +  V6 
V  (aa)  =  a  Vo 

where  a,  b  are  scalar  fields  (functions)  and  a  is  a  numerical  constant. 

The  del  operator  V  may  be  applied  in  the  same  manner  as  a  3-Dvector,  though 
the  result  is  a  description  of  the  rate  of  change  of  the  entity  to  which  it  is  applied. 
The  operator  may  be  applied  to  a  3-D  “field”  of  scalars  (such  as  f  [x.  y.  z].  where  / 
is  a  scalar  “weight”);  an  example  is  the  measurement  of  temperature  at  each  point 
in  [x,  y,  z].  The  result  V/  [x,y,z]  assigns  a  3-D  vector  to  each  point  in  space,  (the 
gradient ).  The  operator  may  be  applied  to  a  field  of  vectors  (e.g.  g  [x,y,z])  via  a 
scalar  product  to  create  a  scalar  field  V«g  \x.  y.  z];  this  is  the  divergence  of  the  vector 
field.  Finally,  it  may  be  applied  to  a  field  of  3-D  vectors  to  create  a  different  3-D 
vector  field  Vxg  [x,y,z\  (the  curl  of  the  vector  field).  The  first  two  operations  may 
be  generalized  to  operate  on  or  generate  2-D  vectors,  whereas  the  curl  is  defined  only 
for  3-D  vector  fields  (though  one  or  two  of  the  components  of  this  field  may  be  zero). 


6.3  Gradient 


Derives  a  Vector  Field  V/  from  a  Scalar  Field  f 

Applying  V  to  a  scalar  field  /  [x,  y.  z\  with  three  dimensions  (such  as  the  temper¬ 
ature  of  air  at  all  points  in  the  atmosphere)  generates  a  field  of  3-D  vectors  which 
describes  the  spatial  rate- of- change  of  the  scalar  field,  i.e.,  the  gradient  of  the  tem¬ 
perature  at  each  point  in  the  atmosphere  is  a  vector  that  describes  the  direction  and 
magnitude  of  the  change  in  air  temperature  over  the  3-D  volume.  In  the  2-D  case 
where  the  scalar  field  describes  the  altitude  of  landform  topography,  the  gradient 
vector  is  the  size  and  direction  of  the  maximum  slope  of  the  landform.  The  3-D 
gradient 


V/  [x,y,z 


'dj_dj_dj_  1  =  ^dj_  dj_  dj_ 
dx  ’  dy ’  dz  ~dx  ~dy  ~dz 


a  vector 


As  implied  by  its  name,  the  gradient  vector  at  [x,  y.  z\  points  “uphill”  in  the  direction 
of  maximum  rate-of-change  of  the  field;  the  magnitude  of  the  gradient  |V/|  is  the 
slope  of  the  scalar  field. x: 
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Scalar  field  represented  as  contour  map  and  as  3-D  display. 


Gradient  of  the  scalar  field  is  a  vector  field.  At  each  coordinate  [x,y],  the  vector 
points  “uphill”  and  its  length  is  equal  to  the  slope. 


As  an  aside,  the  operator  V  is  only  defined  in  Cartesian  coordinates;  general 
expressions  in  cylindrical  and  spherical  coordinates  do  not  exist. 


6.4  Divergence 


Derives  a  Scalar  Field  V  •  g  from  a  Vector  Field  g 
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V  •  g  [x,  y,  z 


"  d  d  d " 
dx '  dy:  dz 
9gx  dgy 
dx  dy 


•  [dx  i  gyi  gz. 


dgz 

dz 


a  scalar 


The  divergence  at  each  point  in  a  vector  field  is  a  number  that  describes  the  total 
spatial  rate-of-change,  such  as  the  total  outgoing  vector  flux  per  unit  volume  ( flux  = 
net  outward  flow ),  and  thus  is  equal  to: 


flux  =  (average  normal  vector  component)  x  (surface  area) 


For  a  vector  field  g  [x,  y.  z\  and  an  infinitesmal  surface  “element”  described  by  its  nor¬ 
mal  differential  vector  element  da  directed  outward  from  the  volume,  the  differential 
element  of  the  “flux”  F  (a  scalar)  of  g  through  the  surface  element  da  is  the  scalar 
(“dot”)  product  of  the  vector  that  describes  the  field  with  the  vector  normal  to  the 
surface.  Thus  the  total  flux  is  the  integral  over  the  surface: 


clF  =  g  •  da 


F  = 


g  •  da 


surface  area  A 


The  divergence  of  the  vector  field  describes  the  total  flux  out  of  a  volume  V.  which 
is  equivalent  to  the  flux  through  the  macroscopic  surface  area  A  of  the  volume  V. 
which  is  in  turn  built  up  from  all  of  the  differential  surface  elements  da  enclosing  the 
volume. 


F 


volume  V 


(V  •  g)  dV 


g  •  da 


surface  area  A 


g  •  n  da 


surface  area  A 


where  n  is  the  unit  vector  normal  to  the  differential  element  of  area  da.  If  there  are 
no  net  “sources”  or  “sinks”  of  flux  within  the  volume  (points  from  which  the  flux  in 
the  field  “diverges  out  of’  or  “converges  into”),  then  the  divergence  of  flux  through 
that  surface  must  be  zero: 


F  = 


volume  V 


(v.g  )dV  = 


g  •  da  =  0 


surface  area  A 


if  no  “source”  or  “sink”  of  vector  field  within  V 
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If  there  is  a  source  of  flux  in  the  volume,  then  the  divergence  of  the  vector  field  is 
proportional  to  that  source  “strength” : 


g  •  da  >  0 


'  surface  area  A 


0 


A  vector  field  with  nonzero  divergence  has  a  disparity  between  input  and  output  flux 


Of  course,  the  flux  of  an  electric  field  is  not  made  up  of  a  substance  that  “moves” 
through  the  surface,  since  the  electric  field  is  not  the  “velocity  of  anything”  (in  Feyn¬ 
man’s  words). 

A  vector  field  whose  divergence  is  zero  used  to  be  called  “solenoidal;”  magnetic 
fields  are  solenoidal. 


6.4.1  Gauss’  Theorem  for  Divergence 


Consider  the  6-sided  cube  with  one  face  (#1)  located  in  the  y-z  plane  at  x  =  0.  and 
the  “opposite”  face  (#2)  parallel  to  the  y-z  plane  at  x  =  Ax,  as  shown  in  the  figure. 
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4 


The  total  flux  through  the  six  faces  of  this  small  cube  is  calculated 


We  want  to  evaluate  the  flux  of  the  vector  field  g  [a;,  y,  z]  through  each  of  the  6  faces. 
The  “inward”  flux  through  face  ffl  is  the  area  integral  of  gx  [0,  y,  z]: 


9i  = 


gx  [0,  y,  z]  dy  dz 


=  9x  [0,  y,  z]  ■  Ay  ■  Az 

The  “inward  flux”  through  face  #2  (opposite  fll)  is  the  area  integral: 


92  = 


-gx  [Aar,  y,  z]  dy  dz  = 


=  (  -gi  +  ^7  •  Ax  )  •  Ay  ■  Az 


gx  [Ax,  y,  z]  dy  dz 


The  sum  of  the  two  fluxes  is: 


9i  +92 


■  Ay  ■  Az 


Similarly,  the  sum  of  the  fluxes  through  opposite  faces  are: 


93  +  94 
95  +  96 


d9y 

dy 

dgz 

dz 


■Ay 


■Az 


■  Ax  ■  Az 


■  Ax  ■  Ay 
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The  sum  of  the  6  fluxes  is  eacy  to  evaluate: 


6 

9n 

n=  1 


dgx 

dx 


■  Ax  ■  Ay  ■  A z  + 


•  Ax  ■  Ay  ■  A z  + 

dy 


dgz 

dz 


■  Ax  ■  Ay  ■  A z 


( dgx  dgy 
\  dx  dy 
(V  •  g)  AV 


■  Ax  ■  Ay  ■  A z 


This  is  the  sum  of  the  normal  vectors  of  the  surface  fluxes  through  the  sums  of  the 
surfaces: 


(V  •  g)  AV  = 


surface  of  differential 
volume 


(g  [a;,  y,  z]  •  n)  da 


If  we  construct  a  macroscopic  volume  by  summing  up  these  differential  volumes,  we 
obtain  the  integral  formula: 


macroscopic 
volume 


surface  of  macroscopic 
volume 


(g  [x,  y,  z\  •  n)  da 


6.5  Curl 


Derives  a  3-D  VectorField  from  a  3-D  Vector  Field  g 

The  curl  of  a  vector  field  describes  a  spatial  nonuniformity  of  the  3-D  vector  field 
g  \x,  y,  z\.  If  the  field  describes  the  flow  of  a  liquid  (matter  moving  with  a  velocity), 
the  curl  determines  whether  the  liquid  is  “circulating,”  i.e.  whether  there  is  a  net 
rotational  motion  of  the  vector  field  about  some  location.  The  word  equation  for  the 
“circulation”  of  the  vector  is: 

circulation  =  (average  tangential  component)  x  ( circumference ) 

Rather  than  develop  the  measure  from  this  equation,  we  again  define  an  operator 
(the  “curl” )  and  show  that  it  measures  the  quantity  in  question.  The  “curl”  of  a  3-D 
vector  field  is  the  cross  product  of  the  differential  operator  V  with  the  field: 


Vxg[i,  y,  z]  =  det 


x  y  z 

d_  d_  d_ 

dx  dy  dz 


gx  gy  gz 


=  X 


dz  ) 


+  y 


dx  J  \  dx 


=  a  vector 


To  visualize  curl,  imagine  a  vector  field  that  describes  motion  of  a  fluid  (e.g.,  water 
or  wind).  If  a  paddle-wheel  placed  at  some  location  in  the  fluid  does  not  revolve,  the 
field  at  that  location  has  no  curl.  If  the  wheel  does  revolve  about  some  axis,  then  the 
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curl  is  nonzero.  The  direction  of  the  curl  vector  is  that  of  the  axis  of  the  paddlewheel 
for  the  maximum  rotation  in  the  counterclockwise  dirction;  the  magnitude  of  the 
curl  vector  is  that  rotation  “rate.”  The  algebraic  sign  of  the  curl  is  determined  by 
the  direction  of  rotation  (counterclockwise  =>•  positive  curl).  The  paddle  will  rotate 
only  if  the  vector  field  is  spatially  nonuniform,  e.g.,  it  will  not  rotate  in  a  vector  field 
where  all  vectors  have  the  same  length  and  point  in  the  same  direction.  A  vector  field 
with  zero  curl  used  to  be  called  “irrotational;”  electrostatic  fields  (i.e.,  NOT  traveling 
electric  waves)  are  irrotational. 

Note  that  some  points  in  the  field  can  have  zero  curl  while  others  have  nonvanish¬ 
ing  curl.  Both  vector  fields  shown  in  the  examples  of  divergence  have  zero  curl,  since 
a  paddle  wheel  placed  at  any  point  in  either  field  will  not  rotate. 


f±[x,y] 


tay] 


Curl  f2^  0 


Two  vector  fields  with  nonzero  curl.  The  “paddlewheel”  rotates  in  both  cases. 

6.5.1  Example  of  Function  with  Large  Curl 

Consider  the  3-D  field  composed  of  vectors  that  satisfy: 

g  [x,  y,  z]  =  (- y )  x  +  (+s)  y  +  Oz 

The  vectors  in  this  field  lie  in  the  x  —  y  plane  and  those  located  on  the  x  or  y  axes 
are  oriented  perpendicular  to  the  axes  and  get  longer  with  increasing  distance  from 
the  origin. 
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The  vector  field  g  [x,  y,  z]  =  [—y,  x,  0] 


This  drawing  is  not  very  complete  and  so  might  be  misleading;  consider  the  vector 
located  at  \x,y]  =  fl,  ll: 

g  [1,1,0]  =  -x  +  y +  0z 

and  so  “points”  in  the  diagonal  direction  (towards  9  =  or  towards  about  10:30 
if  you  remember  what  an  analog  clock  face  looks  like!).  Therefore  the  vectors  in  this 
field  define  a  counterclockwise  “flow”  where  the  velocity  of  the  flow  increases  with 
radial  distance.  It  is  a  1-D  analogue  of  the  “bathtub  drain  vortex.”  The  magnitudes 
and  azimuth  angles  of  the  vectors  in  this  field  may  be  evaluated: 

\g[x,y,z]\  =  \J  (- yf  +  (To;)2  +  02  =  \/x2+y2 


-y  J 

Now  evaluate  the  partial  derivatives  of  the  vectors: 


(j)  \x,  y,  0]  =  tan  1 


dgx  _  dgx  _  ,dgx_ 

ox  oy  oz 

o  '  -^-j  o  ^5  0  u 

ox  oy  oz 

dgz  _  dgz  _  dgz  _ 
dx  dy  dz 


The  curl  of  the  field  is  obtained  by  substitution: 

.  ( dgx 


Y7  r  i  ~  ,  d9z  d9y 

Vxg[W]=x( 


dgz 

dx 


^\dz 

=  X  (0  -  0)  +  y  (0  -  0)  +  z  (+1  -  (-1)) 
=  0-  x  +  0-  y  +  2-  z 


"d(h 

dx 


dgx 

dy 


The  curl  vector  points  in  the  direction  of  the  +z  axis  because  the  flow  is  counter¬ 
clockwise  (in  the  direction  of  +9),  i.e.,  out  of  the  plane  of  the  flow.  The  direction  of 
the  curl  determines  that  the  flow  is  in  the  x  —  y  plane,  and  the  magnitude  of  the  curl 
is  related  to  the  “speed”  of  the  flow,  if  the  vectors  describe  a  motion. 

If  the  divergence 


6.6  Laplacian 

The  divergence  of  the  gradient  often  appears  in  problems  in  electromagnetic  theory 
and  in  imaging.  It  may  be  applied  to  scalar  functions  via: 
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V  •  V/  =  V2/ 


=  x 


d_ 

dx 


.  d 

-dy 


.  d 

ldz 


d2f  d2f  d2f 


dx2  dy 2 

where  the  scalar  operator  V2  is: 


dz 2 


\df  „df  . df ' 


a2  a2  a2 

V2  = - 1 - 1 - 

9a;2  (9|/2  <9z2 

Since  the  Laplacian  is  a  scalar  operator,  it  also  may  be  applied  to  a  vector  field  to 
produce  the  sum  of  the  Laplacians  of  the  three  component  functions  along  the  three 
directions: 


(V  •  V)  g  =  V2g 

^  <92g  ^  d2  g  ^  d2  g 

=  -fa2  +  ^  tfy2  +  -fa2 


The  Laplacian  in  other  coordinate  systems  (cylindrical  or  spherical)  also  may  be 
defined,  but  produces  more  complicated  expressions  (that  we  probably  don’t  need!) 


d2f  d2f  d2f 


V72  f  r  J  i  ■>  i 

V  f[x,y,z\  =  —  +  —  + 


V2/  (p,  <p,  z)  = 

V2/(r,,M)  = 


dx2 

1  d_ 

pdp 

l_  d_ 

r 2  dr 

fa£ 

dr 2 


dy2 
df 

p  dp 

,2  df 
dr 
2  dj_ 
r  dr 


dz2 

i  a2; 


(Cartesian) 

a2f 


p2  dcj)2 

1 


dz2 
d 


(cylindrical) 

df 


sin  9 


de 


i 


a2f 


r2  sin  0  d6 
cot  6  df  1  d2f 
r2  dO  r2  dO2  ^  r2  sin2  <fi  df2 


r2  sin2  0  df2 

d2f 


(spherical) 


(spherical) 


The  Laplacian  is  the  spatial  derivative  in  the  3-D  wave  equation,  which  will  be 
considered  in  more  detail  shortly: 


d2f  d2f  d2f  _  d2f 
dx2  dy2  dz2  p  6  dt2 


V2/ 


1 

v2  dt 2 


6.6.1  Curl  of  Curl 

The  curl  of  the  curl  may  be  evaluated  via  the  vector  triple  product  that  was  presented 
earlier: 
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V  x  (V  x  g)  =  V  (V«g)  —  (V  •  V)  g 
=  V  (V«g)  -  V2g 

In  words,  it  is  the  difference  of  the  gradient  of  the  divergence  and  the  Laplacian  of 
the  vector  field. 


6.7  Electric  and  Magnetic  Fields 

By  1864,  much  was  known  about  electric  and  magnetic  effects  on  materials.  Faraday 
had  discovered  that  a  time-varying  magnetic  field  (such  as  from  a  moving  magnet) 
can  generate  an  electric  field,  and  Ampere  demonstrated  the  corresponding  effect  that 
a  time-varying  electric  field  (as  from  a  moving  electric  charge)  produces  a  magnetic 
field.  Both  electric  and  magnetic  fields  were  known  to  be  vectors  that  could  vary  in 
time  and  space:  the  amplitudes  of  the  electric  and  magnetic  fields  as  functions  of 
position.  Both  quantities  are  spatial  3-D  vectors  that  vary  over  time,  and  may  be 
denoted  by  E  [x,  y.  z,  t]  and  B  [x,  y,  z,  t] .  respectively.  The  electric  field  E  is  measured 
by  the  force  it  exerts  on  a  “test”  electric  charge  Q  (measured  in  coulombs).  The  force 
is  determined  by: 


F  oc  Q  ■  E  —  oc  E  measured  in 

where  the  force  is  measured  in  newtons  [kg^m]  as  the  product  of  the  charge  Q  and 
the  electric  field  E;  it  has  dimensions  of  volts  per  meter  (equivalent  to  joules  per 
coulomb). 


kg  —  m 
s2  —  C 


6.7.1  A  Note  on  Units 


If  you  consult  other  books,  you  will  likely  see  many  differences  in  the  equations  due 
to  the  different  systems  of  units  used  in  electromagnetics  (and  thus  in  optics);  many 
students  (including  the  author!)  find  it  difficult  to  cut  through  the  seeming  morass 
of  differences.  For  example,  two  of  the  well-known  physics  texts  on  the  subject, 
by  Lorrain  and  Corson  and  by  Jackson,  use  different  systems;  the  former  uses  the 
rationalized  MKS  system  (meter,  kilogram,  second),  the  latter  uses  CGS  units  (cen¬ 
timeter,  gram,  second),  which  includes  many  factors  of  47t.  The  systems  evolved  from 
Coulomb’s  law  that  evaluates  the  force  between  two  electrical  charges  Q\  and  Q2' 


F  oc 


Q1Q2  ~ 

2  —12 
r12 


The  constant  of  proportionality  may  be  called  k: 
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If  the  charges  are  measured  in  electrostatic  units  (esu)  (also  called  statcoulombs ), 
the  distance  in  centimeters,  and  the  force  in  dynes  (g  — cm2  —  s2),  then  k  =  1.  This 
means  that  two  charges  of  1  esu  separated  by  1  cm  produces  a  force  of  I  dyn.  But 
what  if  the  charges  are  measured  in  coulombs  [C],  the  distance  in  meters  [m],  and 
the  force  in  newtons  (IN  =  1  kg  —  m—  s-2)?  The  value  of  k  is  determined  from  the 
knowledge  that  there  are  105  dyn  per  N,  2.998  •  109esu  per  C,  and  10  2  cm  per  m: 


(2.998 -109^) 2  oN-m 

_v _ o  /  _  q  noo  w 

(io2^)2  •  105 


The  force  between  two  charges  of  1  C  separated  by  1  m  is  nearly  1010  N  =  4.5  •  1010 
pounds  of  force  [lbf],  or  about  1, 100,000  tons  (of  force)!  The  constant  k  generally  is 
normalized  by  a  factor  of  47t: 


k  =  J—  =4>  F  —  1  ^2f 

4vre0  —  47re0  rf2  ~12 

1  C2 

where  e0  =  - — -  =  8.854  x  1CT12  — - -  = 

4v rk  N  -  m2 


8.854  x  1(T12 


F 

m 


where  1  Farad  [F]  is  equivalent  to: 


IF 


1 


C2 


N  —  m 


(one  coulomb  per  volt),  so  that  1  volt  is  equivalent  to: 

r  N  —  m 
IV  =  1- 


C 

This  “new”  normalization  constant  is  called  the  “dielectric  constant”  or  “permittiv¬ 
ity”  of  free  space. 

A  similar  procedure  for  the  magnetic  force  between  two  current-carrying  wires 
leads  to  the  exact  value  of  a  proportionality  constant  //0: 


/j0  =  47t  •  107 


N 

A2 


where  1  ampere  is  one  coulomb  per  second.  The  magnetic  field  in  free  space  B  (the 
so-called  magnetic  induction ,  measured  in  tesla)  is  then  related  to  the  magnetic  field 
intensity  H  (also  called  the  auxiliary  field )  in  free  space  (measured  in  amperes  per 
meter)  by 

B  =  /i0H 


6.7.2  Magnetic  Fields 

The  concept  of  a  magnetic  field  is  seemingly  somewhat  less  intuitive,  so  we’ll  consider 
it  in  somewhat  more  detail.  The  magnetic  field  is  measured  in  terms  of  the  “flux” 
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(often  labelled  by  </>),  which  is  a  term  arising  from  the  original  concept  of  “lines”  of 
magnetic  flux  emanating  from  the  magnetic  “poles.”  In  fact,  the  original  CGS  unit 
for  magnetic  flux  was  called  the  “line”  (now  called  the  “maxwell,”  Mx).  The  flux 
emanating  from  a  unit  field  of  1  gauss  is  47t  lines  because  the  area  of  the  sphere  is 
47rr2.  The  MKS  unit  of  magnetic  flux  is  the  “weber”  (lWb  =  108Mx),  which  was 
defined  as  the  amount  of  flux  which,  when  changing  uniformly  in  one  second,  induces 
1  volt  in  1  turn  of  a  conductor.  In  electromagnetism,  the  more  important  quantity  is 
the  magnetic  flux  density,  labeled  B  and  measured  in  gauss  (CGS)  or  tesla  (MKS). 
One  gauss  is  one  line  (maxwell)  through  an  area  of  1  cm2  and  one  tesla  is  lWb  per 


1  G 
1  T 


Mx 

cm2 

Wb  _  N 
m2  A  —  m 
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Two  other  vector  fields  are  required  when  describing  propagation  of  electromag¬ 
netism  through  matter  (rather  than  through  vacuum):  the  electric  displacement  D 
and  the  magnetic  field  intensity  H  (also  called  the  “magnetizing  force”  or  the  “aux¬ 
iliary  field”).  We  assume  that  any  material  is  linear,  isotropic,  and  homogeneous. 
“Linearity”  means  that  the  response  of  the  medium  to  an  incident  field  varies  in 
proportion  to  the  field.  The  response  of  “isotropic”  media  does  not  change  with  ori¬ 
entation  of  the  field,  while  the  characteristics  of  a  “homogeneous”  medium  do  not 
vary  with  position  in  the  medium.  The  electric  displacement  D  defines  the  total 
electric  field  within  a  material  due  to  an  external  field  E.  It  is  the  sum  of  E  and  any 
local  field  P  generated  within  the  matter  due  to  the  changes  in  positions  of  electric 
charges  within  the  material  due  to  that  field;  this  induced  field  P  is  called  the  “po¬ 
larization”  of  the  material  (not  to  be  confused  with  the  “polarization”  of  the  electric 
field  vectort-hat  we  will  mention  later).  H  is  a  similar  construct  for  magnetic  fields. 
E  and  D,  and  B  and  H  are  related  by  the  so-called  constitutive  equations  that  are 
determined  by  constants  of  the  medium: 

D  =  eE 

B  =  //H 

where  e  and  /i  are  the  electric  permittivity  and  magnetic  permeability  of  the  material, 
respectively.  These  are  measures  of  the  ability  of  the  electric  and  magnetic  fields  to 
“permeate”  the  medium;  if  e  is  increased,  then  a  larger  electric  field  exists  within  the 
material,  if  /i  is  larger,  then  the  magnetic  field  H  does  not  penetrate  as  far  into  the 
medium. 


Since  we  will  consider  propagation  of  light  only  in  vacuum,  D  =  E  and  B  =  H. 
In  vacuum,  /i  and  e  are  denoted  /i0  and  e0  and  both  are  set  to  unity  in  CGS  units.  In 
MKS  units,  the  quantities  are: 
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7  N 

/Jo  =  4vr  •  10'  ^2 
e0  =  8.85  •  1(T12 


( newton  per  square  ampere ) 

F 

—  ( farads  per  meter). 


As  is  true  for  the  refractive  index  n.  the  permittivity  and  permeability  in  matter  are 
larger  than  in  vacuum,  e  >  e0,  and  /i  >  //0 .  In  fact  (though  we  won’t  discuss  it  in 
detail),  e  and  //  determine  the  phase  velocity  v  and  the  refractive  index  n  via: 


1 

v  = - 

7/77 

c  JJie 

n  =  —  =  — - 

V 


6.8  Maxwell’s  Equations 


Maxwell  collected  the  four  differential  equations  relating  the  electric  vector  field  E 
and  the  magnetic  vector  field  B  listed  below  and  solved  them  to  derive  the  character 
of  electromagnetic  waves.  The  equations  may  be  written  in  equivalent  differential  and 
integral  forms. 


6.8.1  Gauss’  Law  for  Electric  Fields 

Gauss’  law  relates  the  flux  of  the  electric  field  over  a  closed  surface  to  the  total  charge 
enclosed  by  the  surface.  In  its  simplest  terms,  Gauss’  law  states  that  the  existence 
of  electrical  charges  within  a  volume  produces  electric  fields  that  pass  through  the 
surface  of  the  volume.  The  flux  of  the  field  through  the  surface  is  proportional  to 
the  “amount”  of  charge  within  the  volume.  If  the  volume  is  enlarged,  then  so  is  the 
surface  area,  so  the  flux  density  through  the  surface  must  decrease  at  the  same  rate 
that  the  surface  area  increases.  Also  note  that  if  there  is  no  charge  within  the  volume, 
there  still  can  be  flux  through  the  enclosing  surface,  but  the  ingoing  and  outgoing 
parts  of  the  flux  cancel  out. 

Consider  an  element  of  the  closed  surface  defined  by  its  normal  vector  da.  The 
flux  of  the  electric  field  through  this  surface  element  is: 

d<f>  =  E  •  da 

where  the  symbol  denotes  the  scalar  product  of  the  two  vector  quantities.  Ac¬ 
cording  to  Gauss’  law,  the  integral  of  this  quantity  over  the  entire  closed  surface 
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is: 


c/<f>  = 


surface 


E  •  c/a 


'  surface 

Q  l 


p  [x,  y,  z]  dV 


volume 


where  p  [x,  y,  z ]  is  the  volume  density  of  charges  (measured  coulombs  per  unit  volume). 
If  the  surface  encloses  no  charges,  then  this  integral  evaluates  to  zero,  states  that  the 
the  divergence  of  the  vector  electric  field  is  proportional  to  density  of  electric  charges. 


E  •  c/a  = 


surface 


volume 


(  d  d  d  N 

(^E  [x,  y ,  t\  +  t^E  [x,  y ,  z,  t]  +  — E  [a;,  y,  z,  t]  dV 


(V  •  E  [a;,  y ,  z,  £])  dV 


volume 


(V  •  E  [x,y,z,t])  dV  =  - 

volume  ^ 

V  •E[x,y,z,t]  =  J 


p  [ x ,  i/,  z]  dV 


volume 


In  words,  the  divergence  of  the  vector  electric  field  is  a  scalar  that  is  proportional  to 
the  total  charge  within  the  volume. 


6.8.2  Gauss’  Law  for  Magnetic  Fields 

Since  there  are  no  magnetic  analogues  for  “charges”,  the  volume  cannot  enclose  a 
magnetic  analogue  of  p  or  ().  which  leads  to  the  particularly  simple  forms  for  Gauss’s 
law  for  the  magnetic  flux  density: 


/  /  B  •  c/a  =  0 

J  J  surface 

V  •  B  [x,  2/,  z,t\  =  0 


In  other  words,  the  flux  of  the  magnetic  field  through  any  enclosed  surface  ALWAYS  is 
zero.  This  is  often  interpreted  by  the  statement  that  there  are  no  magnetic  analogues 
of  charges  (called  magnetic  “monopoles”).  An  “electric  monopole”  is  an  electron  or 
proton  that  carries  a  net  charge  of  one  “sign.” 


6.8.3  Faraday’s  Law  of  Magnetic  Induction 

Michael  Faraday  observed  in  1831  the  phenomenon  that  he  called  “electromagnetic 
induction,”  that  generates  (“induces”)  electricity  in  a  wire  from  a  current  in  another 
wire.  In  other  words,  he  discovered  the  basis  for  the  electric  transformer.  Shortly 
thereafter,  Faraday  discovered  magneto-electric  induction:  how  to  produce  a  steady 
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electric  current  in  a  wire  by  physically  manipulating  a  magnet  near  the  wire.  Faraday 
attached  two  wires  to  a  copper  disc  through  a  sliding  contact.  He  rotated  the  disc 
between  the  poles  of  a  horseshoe  magnet  and  generated  a  continuous  direct  current; 
in  short,  this  was  the  first  generator  (the  “dynamo”). 

The  mathematical  formulation  of  Faraday’s  magneto-electric  induction  is  called 
Faraday’s  law,  which  states  that  the  rate  of  change  of  a  magnetic  field  through  a  2-D 
surface  is  proportional  to  the  circulation  of  the  electric  field  around  the  1-D  perimeter 
of  the  2-D  surface.  In  mathematical  terms,  the  time  derivative  of  the  magnetic  field 
is  proportional  to  the  particular  spatial  derivative  (the  curl)  of  the  electric  field: 


<9B 

~dt 


-V  x  E 


Thus  a  time-varying  magnetic  field  produces  a  spatially  varying  electric  field,  and 
vice  versa. 


6.8.4  Ampere’s  Law 


The  analogue  of  Faraday’s  law  relates  the  rate  of  change  of  the  flux  of  an  electric  field 
through  a  surface  to  the  circulation  of  the  magnetic  field  around  the  perimeter  of  the 
surface.  Maxwell  added  a  “correction  term”  due  to  the  flux  of  electric  current  (due 
to  moving  electric  charges)  through  the  surface.  The  corrected  form  of  Ampere’s  law 
is: 


<9E  T  ^  B 

+e--—  +  J  —  V  x  — 
at  fi 


where  the  additional  source  term  J  is  the  “current  density”  of  the  electric  field  (mea¬ 
sured  in  amperes  per  unit  volume,  erquivalent  to  coulombs  per  second  per  unit  vol¬ 
ume).  Note  the  difference  in  the  algebraic  sign  in  the  analogous  expressions  of  Fara¬ 
day’s  law  and  Ampere’s  law. 

We  have  already  seen  that: 

1 


where  c  is  the  velocity  of  light,  c  =  2.99792458  x  108ms-1,  which  shows  that  the 
effect  of  the  spatial  variation  of  the  magnetic  field  produces  a  much  smaller  temporal 
change  in  the  electric  field  than  vice  versa. 

There  are  two  “source”  terms  in  Maxwell’s  equations:  the  “static”  charge  density 
p  and  the  “dynamic”  current  density  J.  These  can  only  be  nonzero  within  media 
(such  as  copper  wire)  and  thus  vanish  in  vacuum.  If  we  consider  the  propagation  of 
light  only  in  a  vacuum,  neither  electric  charges  nor  conductors  are  present  and  both 
source  terms  vanish. 


6.8.5  Maxwell’s  Equations 

(Jackson,  Classical  Electrodynamics,  §6) 

In  1864,  James  Clerk  Maxwell  collected  these  four  equations  and  derived  the  form 
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of  the  fields  that  simultaneously  satisfy  them  in  some  simple  cases.  In  rationalized 
MKS  units,  the  differential  forms  of  the  equations  (after  adding  the  correction  term 
to  Ampere’s  law)  are: 


V-E 


V  B 
dB 

~~dt 

<9E 


-  Gauss  ’  Law  for  Electric  Fields,  Coulomb ’s  Law 
e 

0  Gauss  ’  Law  for  Magnetic  Fields 

V  x  E  Faraday ’s  Law  of  Magnetic  Induction 

g 

V  x  =  —  J  Ampere ’s  Law 


These  four  coupled  first-order  differential  equations  can  be  solved  directly  in  many 
cases;  we  will  do  a  simple  one  now  that  leads  to  propagation  of  electromagnetic  plane 
waves. 

The  definition  of  curl  may  be  used  to  rewrite  the  four  first-order  differential  vector 
equations  of  Maxwell  as  eight  first-order  scalar  differential  equations: 


0EX 

dx 

dBx 

dx 


+pe 

+pe 

+pe 


dEz 

dz 

dBz 

dz 

dBx 

dt 

dBy 

dt 

dBz 

dt 

dEx 

dt 

dEy 

dt 

dEz 

dt 


0  Gauss’ Law  for  Bif  no  sources  present 


0  Gauss  ’  Law  for  B 


dEZ  dEy 

Laraday  s  Law 


dy 

dEx 


dz 

dE7 


dz  dx 
dE„  dEx 


dx 

dBz 

dy 

dBx 

dz 

dBy 

dx 


dy 

dBy 

dz 

dBz 

dx 

dBx 

dy 


Ampere’s  Law 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 


6.9  Wave  Equation 


Take  the  curl  of  both  sides  of  Faraday’s  law.  We  can  use  the  expression  for  the  “curl 
of  the  curl”  previously  mentioned  (though  not  derived)  to  evaluate  the  curl  of  the 
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curl  of  the  electric  field: 


V  x  (V  x  E)  =  V  (V*E)  -  (V  •  V)  E 
=  V  (V«E)  -  V  •  VE 

=  0  -  v2e 


where  Gauss’  law  applicable  in  “sourcefree”  regions  has  been  used  in  the  last  step 
(the  enclosed  charge  is  zero).  The  right  side  of  the  equation  may  be  rewritten  by 
applying  Ampere’s  law: 


V  x 


d_ 

dt 


(V  x  B) 


1  d2E 


c 2  dt2 


After  equating  the  two  sides  of  the  equation: 


V2E 


1  d2E 

c2  dt2 


where  c  is  the  velocity  of  light.  This  expression  relates  the  spatial  and  temporal 
second  derivatives  of  the  electric  field  and  is  called  the  wave  equation.  It  was  first 
introduced  by  d’Alembert  in  1747.  It  assumes  that  no  energy  of  the  wave  is  lost  as  it 
propagates,  as  would  occur  if  friction  or  damping  forces  are  present. 

The  general  wave  equation  may  be  written: 


1  d 2 

VV  [x,  y,  z,  t]  =  —  — ^  [x,  y,  z ,  t\ 

where  v  is  the  velocity  of  a  point  of  constant  phase:  our  old  friend  the  phase  velocity. 
The  wave  equation  may  be  rigorously  derived  for  a  transverse  wave  on  a  string  -  you 
probably  saw  this  in  a  classical  mechanics  course. 

The  wave  equation  for  electric  fields  confirms  our  earlier  observation: 


Think  of  this  result  for  a  second;  the  phase  velocity  of  the  wave  in  a  medium  is  related 
to  two  measureable  properties  of  the  medium;  the  permittivity  and  the  permeability. 

The  1-D  equation  may  be  written  in  the  form  of  a  “second-order  homogeneous” 
differential  equation: 

/  d2  1  d2  \  r  .  „ 

Vai2  -0 

Any  differential  equation  is  linear,  so  that  if  ipi  [z,  t]  and  ^2  [z,  t]  are  solutions  to  the 
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equation,  so  is  aipi  [z,t]  +  bip 2  [z,t].  The  linearity  property  means  that  light  beams 
can  pass  “through”  each  other  and  that  waves  can  constructively  or  destructively 
interfere. 

The  wave  equation  has  the  simple  solution: 


'ip  [z,  t)  =  f  [z  ±  vt\ 


where  /  [w]  is  any  function  that  may  be  differentiated  twice. 

Proof. 


„  .  du  du 

Define  u  =  z  ±vt  =>-  —  =  !,  —  = 

dz  dt 

.  ,  df  df  du  df  df  du 

Apply  the  chain  rule:—  =  —  •  —  and  — 7  =  y-  •  -7 

dz  du  dz  dt  du  dt 


df  =  df.  1 

dz  du 


df  df 

and  m  =  -Fu ' (±v) 


Substitute  into  wave  equation: 


d2f  1  d2f 


dz 2 


dt 2 


su  =  ___ 

dt2  du2 

sy  =  i 

du2  v2 


d2f  _  d2f 
dz2 
d\f 


■  V 


du2 

.2 


<P£ 

du2 


■  v 


ay 

du2 


The  expressions  for  sinusoidal  waves  derived  in  the  last  section  satisfy  the  wave 
equation: 


52 

(x  E0  cos  [kz  —  cut})  =  x  Eq  (—A;2)  cos  [kz  —  cut \ 

1  d2  1 

(x  E0  cos  [kz  —  cut})  =  —x  E0  (— cu2  cos  [kz  —  cut}) 


dt 2 


v" 

=  -X  E0 


cu 


cos  [kz  —  cut } 


cu* 

k2 


ou 

v=k 


thus  agreeing  with  our  previous  expression  for  phase  velocity. 
If  the  general  solution  to  the  wave  equation  has  the  form: 


tp  [z,  t]  =  f[z—  vt[ 


where  the  form  of  the  function  /  is  arbitrary,  then  the  argument  of  the  function 
[z  —  vf]  (the  “phase” )  remains  constant  if  x  increases  with  increasing  time.  The 
“shape”  /  moves  towards  7  =  +00  with  increasing  time  without  changing  its  shape 
(i.e.,  without  “dispersion”).  A  second  solution  to  this  equation  is: 

ip  [z,  t]  =  g  [z  +  vf] 


which  moves  towards  z  =  —00. 
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The  spatial  derivative  of  the  corresponding  3-D  wave  equation  is  the  sum  of  the 
three  second  partial  derivatives: 


1  d 2 

v2  dt2 


ip[x,y,z,t\ 


dx2 


i\)  [x,y,z,t] 


VV  [x,y,z,t] 


dy2 


i\)  [x,y,z,t]  + 


cP_ 

dz 2 


ip[x,y,z,t] 


The  3-D  wave  may  still  be  a  sinusoid  with  argument  in  radians,  so  we  must  be  more 
careful  about  how  the  3-D  function  becomes  a  1-D  function.  The  x,y,  and  z  depen¬ 
dencies  all  have  associated  “wavelengths”  that  may  be  defined  by  their  corresponding 
“wavenumber”  kx ,  ky .  kz  that  may  be  written  as  a  “wavevector”  k0: 


fp  [x,  y,  Z,t]  =  A  cos  [<f>  [a;,  y,  z,  t]] 

=  A  cos  [kxx  +  kvy  +  kzz  ±  coot  —  0o] 


=  A  cos  [k0  •  r  ±  coot  — 


Note  that  the  components  of  the  electric  and  magnetic  fields  (Ex,  Ey,  Ez.  Bx,  Byr 
and  Bz)  all  satisfy  the  wave  equation. 


6.9.1  Electromagnetic  Waves  from  Maxwell’s  Equations 

In  the  general  case,  the  electric  field  and  magnetic  fields  can  have  the  form: 

E  [x,  y,  z ,  t]  =  xEx  [x,  y,  z,  t]  +  y. Ey  [x,  y,  z,  t]  +  z Ez  [x,  y,  z,  t] 

B  [x,y,z,t]  =  ZBx[x,y,z,t]  +y  By[x,y,z,t]  +  z  Bz[x,y,z,t] 

We  will  now  solve  these  equations  for  a  single  specific  case:  an  infinite  plane  electric 
field  wave  propagating  in  vacuum  toward  z  =  +oo.  The  locus  of  points  of  constant 
phase  (often  called  a  wavefront )  of  a  plane  wave  is  (obviously)  a  plane.  The  vector 
electric  field  E  is  assumed  to  be  constant  as  a  function  of  x  and/or  y  at  a  particular 
value  of  z,  but  its  vector  amplitude  can  vary  with  z ;  this  variation  will  be  shown  to 
be  sinusoidal.  This  constraint  means  that  derivatives  of  the  components  “transverse” 
to  the  direction  of  travel  must  vanish: 

dEx  =  dEx  =  dE]L  =  dE]L  =  dE1  =  dE1  = 
dx  dy  dx  dy  dx  dy 

and  thus  the  expression  for  the  4-D  vector  field  E  [a:,  y,  z,  t]  can  be  simplified;  the 
components  are  no  longer  functions  of  x  or  y: 

E  [x,  y,  z,t\  =  E  [z,  t]  =  ftEx  [z,  t]  +  y Ey  [z,  t]  +  z Ez  [z,  t]  (10) 
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From  (9)  and  Gauss’  law  for  electric  fields  (1),  we  find  that: 


dEx  dE„  dEz 


dx 

0  +  0 


dy 


dz 


dEz 

dz 


=  0 


=  0 

dEz 

dz 


=  0 


(11) 


Since  the  derivative  of  Ez  with  respect  to  z  vanishes,  then  the  ^-component  of  the 
electric  field  Ez  must  be  constant,  and  its  amplitude  is  arbitrary  so  we  select  it  to  be 
0: 


Ez  [x,y,z]  =  constant  — >  0  (12) 

Therefore,  the  electric  field  is  now  expressable  in  a  much  simpler  form: 

E  [x,y,z\  =  Ex  [z,t]  +  Ey  [ z ,  t]  ( 13) 


i.e., 


the  only  existing  electric  field  is  perpendicular  (transverse)  to  z! 


This  is  a  sig¬ 
nificant  result  in  and  of  itself.  We  can  simplify  eq.(13)  by  rotating  the  coordinate 
system  about  the  z  axis  to  align  E  with  the  .r-axis.  so  that: 


Ey[z,t\  =  0  by  assumption 

The  resulting  expression  for  the  electric  field  is  now  very  simple: 

E  [x,  y,  z]  =  xF,  [z,  t] 


(14) 


(15) 


We  can  substitute  this  expression  into  Faraday’s  Law  (eqs.  3,4,5)  to  find  the  magnetic 
field: 


dBx 
'  dt 

dBy_ 
dt 

dBz  =  dEy 
dt  dx 


UJ-JZ 

t 

=  0-  ■ 

dy 

dz 

dEx 

dEz 

_  dEx 

dz 

dx 

dz 

dEy 

dz 


dBr  dEv 


dt 


dz 


dE± 

dy 


(EBy 

dt 


dEx 

dz 


=  0  =>•  Bz  [t]  is  constant  with  time 


=  0  ==>  Bx  [t]  is  constant  with  time 

(3) 

(4) 

(5) 


We  can  arbitrarily  set  the  constant  terms  Bx  [t]  =  Bz  [t]  =  0,  so  the  only  remaining 
equation  is: 


dBy  _  dEx 

dt  ~  dz  ^ 

which  says  that  the  time  derivative  of  the  magnetic  field  By  is  equal  to  to  the  negative 
of  the  space  derivative  of  Ex.  We  can  now  find  a  relation  between  By  and  Ex  by 
standard  solution  techniques  of  differential  equations.  For  example,  we  can  use  the 
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wave  equation: 


v2e  = 


1  d2E 


c 2  dt2 

d2Ex[z,t\  1  d2Ex[z,t] 

dz2  c2  dt 2 

Assume  that  Ex  [z,t]  varies  sinusoidally  with  z: 

E  [x,  y,  z,t ]  =  x  Ex  [z,  t]  =  x  Eg  cos  [k0z  —  ujgt] 

It  is  easy  to  show  that  this  satisfies  the  wave  equation  if  the  phase  velocity  of  the 
wave  is  what  we  would  predict: 


d2Ex  [z,  t]  d2Ex  [z,  t] 


dx2 

d2Ex  [z,  t] 
dz2 

d2Ex  [z,  t] 
dt 2 


=  0 


dy2 

=  —  (kg)  Eg  COS  [k0Z  —  UJg t] 

=  —  ( — Co^o)2  Eg  COS  [kgz  —  UJgt] 

_  ^0 
kg 


Now  substitute  this  to  solve  for  By  by  integration: 


dBv  dEr 


dt  dz 
- By[z,t ]  = 


=  —kgEg  sin  [kgz  —  CUgt] 


dEr, 


dt  = 


-By  [z,t\  —  +/co£'o  • 

kg 


dz 

COS  [kgz  —  OJgt] 
-CU0 


- kgEg )  /  sin  [kgz  —  UJgt]  dt 


=  Eg - COS  [kgz  —  UJgt] 

UJg 

°  COS  [kgZ  —  UJgt] 


=  —COS  [kgZ  —  UJgt] 

V<t> 


B  [z,  t]=  y  ^  ~~cos  [kgz  -  cu0t]^ 


where  v<^  is  the  phase  velocity  of  the  electromagnetic  wave  that  we  have  already 
defined. 

Eg  r,  Ex 


Bv  =  — -  COS  \kgZ  —  cu0tl  = 

V0  V0 


EX  V0  B  y 
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Note  that  the  only  existing  component  of  B  is  By .  which  is  perpendicular  to  the  com¬ 
ponent  Ex  of  E,  and  that  both  E  and  B  are  transverse  to  the  direction  of  propagation 
(in  the  +z  direction).  Also  note  that  the  sinusoidal  variations  of  E  and  B  have  the 
same  arguments,  which  means  that  they  oscillate  “in  phase” .  The  amplitude  of  the 
magnetic  field  is  smaller  by  the  factor  of  the  phase  velocity  v0  =  c,  so  the  effect  of  the 
magnetic  field  on  observations  is  generally  much  smaller  and  often  may  be  ignored  in 
physical  situations,  though  it  is  essential  for  the  electric  field  to  propagate. 

6.9.2  Poynting  Vector 

Given  transverse  electric  and  magnetic  fields  E  and  B,  the  light  propagates  in  the 
mutually  orthogonal  direction  k: 


s  =  c2eExB 

This  is  “Poynting’s  vector”  and  has  dimensions  of  power  per  unit  area  (e.g.,  ^4).  It 
measures  the  total  flow  of  energy  through  unit  area  in  unit  time. 

In  this  case  with  E  [x,  y,  z ,  t]  =xi?I  [z,  t]  and  B  [x,  y,  z,  t]  =y By  [z,  t],  the  propaga¬ 
tion  direction  is: 


E  [x,y,  z,t\=±Ex  [z,t] 

B  [x,  y,  z,  t]=y_By  [z,  t]  =  y^  [z,  t] 

s  =  (xxy)  •  v2e—  =  z  (v2eE2) 

along  the  +z  direction,  as  jrredicted. 

The  average  power  of  the  light  wave  per  unit  area  is  the  “irradiance,”  and  is 
determined  from  the  Poynting  vector 


I[x,y,z,t]  =  (s  [x,y,  z,t])  = 


,  ,  AT 

1  ft+— 


AT 


s  [x,  y,  z,  t'}  dt ' 


’t-i 


AT  Jt_*r 


c2e 


Eq 

(xP0  cos  [k0z  —  ix0t ])  xy  (  —  cos  [ k0z  —  co0t] 


dt' 


AT 


(xxy) 


<t- 


^  e  2 

c2e—  cos2  [k0z  —  cu0t]  dt' 

AT  C 


=  z  (ceT2) 


rt+± 


=  z  (ceT2)  •  - 


AT  Jt_*r 

1 


cos2  [koz  —  cuq t\  dt' 


s 


average 


1EI 

2  v 
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6.9.3  Redux  on  Phase  Velocity  of  Electromagnetic  Waves 


Given  the  form  for  the  plane  electromagnetic  wave  in  a  vacuum,  we  can  now  use  the 
three  Ampere  relations  to  find  something  else  useful: 


dEx  dBz  ()B„ 

[  dt  dy  dz 

dEy  _  dBx  3BZ 

”^e  dt  dz  dx 

dEz  _  dBy  dBx 

”*~^e  dt  dx  dy 


Because  E  =  xEq,  only  (6)  does  not  vanish: 


(6) 

(7) 

(8) 


d  d  { E  N 

lie—  ( E0  cos  [k0z  -  uj0t ])  =  (  ^cos  lko z  -  u0t] 


Eq 


fieE0  (c u0  sin  [k0z  —  w0i])  = - (— k0  sin  [ k0z  —  w0t]) 

v</> 


I-LEOJoEq  — 


lie  = 


Er\ki 


0^0 


v<j> 


UJqV^ 


v<t>  =  (y/JIe) 


-1 


vi  = 


Wo 

ko. 


1 

fie 


which  we  already  knew  from  the  wave  equation.  In  vacuum,  /i  =  /i0 ,  e  =  eo,  v<p  =  c, 
and  c  =  The  permittivity  and  permeability  of  free  space  (vacuum)  can  be 

measured  in  laboratory  experiments,  thus  allowing  a  calculation  of  the  phase  velocity 
of  electromagnetic  waves.  The  permeability  in  vacuum  is: 

^  ^_7  newton  „  newton 

fi  <1  /iQ  =  47r  x  10  7 - -  =  1.26  x  10  6 


ampere^ 


ampere^ 


and  the  permittivity  is: 


e0  =  8.85  x  10 


-12 


farads 


m 


Index  of  Refraction 

Of  course,  the  dimensionless  ratio  of  the  velocity  of  light  in  vacuum  to  that  in  the 
medium  is  the  index  of  refraction  n: 


c 

n  =  -  >  1 
v 


eo 


For  metals  and  absorptive  materials,  the  index  of  refraction  is  complex  valued  and 
the  permeabilities  may  not  be  equal.  The  complex  refractive  index  often  is  denoted 
by  n  and  its  imaginary  part  by  k: 


~2  _  (  |  -  \ 2  /ie 

n  =  [n  +  ik)  = - 

Afoeo 
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so  that 


Ikl  =  k  =  n—  =  (n  +  in)  — 


which  implies  that  the  wavevector  k  is  complex-valued  as  well.  In  this  situation,  the 
propagating  electric  field  is  written: 

E  [x,  y,  z,  t]  =  E0  exp  [i  (k  •  r  —  u0t)} 

=  E0  exp  [i  ( k  (s  •  r)  —  u0t)] 

=  E0 exp  i  f(n  +  in)  —  (s  •  r)  —  cuot'j 

T-,  r.  fn~  ai  r  ^  j 

=  E0exp  iuj0  s  •  r  —  tj  exp  —  k — (s«rj 

If  we  assume  that  the  direction  of  propagation  s  is  in  the  direction  of  r  (as  in  a  plane 
wave),  then 

s  •  r  =  |s|  |r |  cos  (0)  =  |r |  =  r 
so  that  the  electric  field  may  be  simplified  to: 

_  r  (n  M  r  wo  1  „  r  (n  U  r  r~ 

E0exp  iLU0y—r  —  tj  exp  —  k — r  =E0exp  iojq  y—  r  —  t J  exp  — - 

where  the  second  term  decays  with  increasing  r;  it  is  attenuated.  The  amplitude 
decreases  by  the  factor  of  e-1  =  0.368  of  the  incident  value  when  r  is  equal  to  the 
skin  depth  5: 

^  _  /raucA-1  _  _c_  _  A0 
\  C  /  KUJo  2l TK 

where  Ao  is  the  wavelength  measured  in  vacuum.  The  skin  depth  A  is  a  measure  of 
the  distance  that  light  will  penetrate  through  the  attenuating  medium.  In  a  metal, 
the  imaginary  part  k  of  the  index  of  refraction  can  be  large,  which  means  that  the 
skin  depth  is  small  and  the  electric  field  “lies”  on  the  surface  of  the  metal;  little  field 
penetrates  to  the  interior. 


If  n  (and  thus  k)  is  real  valued  (i.e. ,  k  =  0.  as  is  true  for  optically  transparent 
media),  then  the  electric  field  is 


E0  exp  icu0  (^—r  —  t'j  =  E0  exp  icu0  ^ - t 


=  E0  exp  [i  ( nk0r  —  cu0t)] 


which  confirms  that  the  velocity  is 


and  that  the  wavelength  in  the  medium  is: 


w  _  Ao 
A0  — 

n 
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In  the  case  of  complex-valued  refractive  index,  the  magnetic  field  is  obtained  from 
the  electric  field  via: 


kxE 

uo 


n 

c 


(sxE) 


where  s  is  the  unit  vector  in  the  direction  of  propagation  of  the  light  (the  Poynting 
vector).  Note  that  some  authors  write  h  =  n(l  +  m),  where  k  is  the  attenuation 
index. 

Values  of  the  refractive  index  for  common  materials  include: 


Medium 

n 

vacuum 

1.0  (by  definition) 

air 

=  1.00027 

water 

1.33 

“crown”  glass 

1.5 

“flint”  glass 

1.7 

diamond 

2.417 

germanium 

=  4.0  (only  transparent  for  A  ^  2  /jm 

6.10  Consequences  of  Maxwell’s  Equations 

1.  Copropagation  of  E  and  B:  The  wave  travels  in  a  direction  mutually  perpen¬ 
dicular  to  both  E  and  B,  and  in  fact  the  propagation  direction  is  defined  by 
the  direction: 

s  oc  ExB  -the  Poynting  vector 

In  other  words,  the  wave  requires  both  electric  and  magnetic  fields  to  propa¬ 
gate,  and  they  copropagate.  I  like  to  interpret  this  result  as  meaning  that  the 
magnetic  field  provides  the  medium  for  propagation  of  the  electric  field  and  vice 
versa. 

2.  The  electric  and  magnetic  fields  of  an  electromagnetic  wave  are  mutually  per¬ 
pendicular. 

3.  In  vacuum,  E  and  B  are  in-phase,  which  means  that  the  phases  of  the  sinusoidal 
variation  of  E  and  B  are  identical  (the  phases  of  the  fields  often  are  out  of  phase 
in  some  types  of  matter). 

4.  Both  E  and  B  travel  at  c,  the  phase  velocity  of  the  wave. 

5.  Energy  is  carried  by  both  the  electric  and  magnetic  fields,  and  the  magnitude 
of  the  energy  £  oc  E^. 
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6.  There  is  no  limitation  on  the  possible  frequencies  of  the  waves,  i.e.  ,  [0  <  u  <  oo], 
which  implies  the  allowed  wavelengths  are  in  the  interval  [oo  >  A  >  0] 

7.  The  average  power  of  the  light  wave  per  unit  area  is  the  “irradiance,”  and  is 
determined  from  the  Poynting  vector 

I  [x,  y,  z,t]  =  ^  (ceE%) 


E 


Relationship  between  E  and  B  for  a  linearly  polarized  wave  traveling  from  left  to 

right;  the  fields  are  in  phase. 


6.11  Dispersion  Redux 

Earlier  we  considered  the  effects  of  dispersion  on  traveling  waves  from  a  simple  point 
of  view  where  we  just  assumed  that  waves  with  different  temporal  frequencies  might 
travel  at  different  speeds.  We  called  the  dispersion  “normal”  if  the  velocity  of  waves 
with  longer  wavelengths  exceeds  that  of  waves  with  shorter  wavelengths,  which  also 
means  that  the  index  of  refraction  decreases  with  increasing  wavelength.  At  this 
point,  we  will  try  to  understand  why  this  is  so  and  also  determine  the  conditions  that 
are  necessary  for  anomalous  dispersion  (where  the  index  of  refraction  increases  with 
increasing  wavelength) . 

The  first  theory  of  dispersion,  based  on  the  understanding  of  elastic  solids,  was 
put  forth  by  Cauchy  in  1836.  He  observed  a  relationship  between  the  phase  velocity 
of  light  in  a  medium  and  the  elasticity  e  of  the  solid  (the  restoring  force  exerted  upon 
a  displaced  particle  by  a  neighboring  particle)  and  the  density  p  of  the  medium: 


if  measured  at  wavelengths  much  longer  than  the  scale  of  the  vibrating  particles  in 
the  medium.  Cauchy  deduced  a  formula  for  the  frequency  dependence  of  the  index 
of  refraction  that  bears  his  name: 
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where  A.  B,  and  0  are  constants  determined  from  measurements  of  n  at  three  wave¬ 
lengths.  This  expression  gives  a  pretty  good  approximation  of  n.  For  example,  the 
Cauchy  formula  for  the  index  of  refraction  of  air  is: 


n  =  1.000287566 


1.3412  nm2 


0 

V 


which  decreases  with  increasing  wavelength. 

The  phenomenon  of  anomalous  dispersion  may  have  been  observed  (but  not  pur¬ 
sued)  by  William  Fox  Talbot.  The  first  significant  study  was  performed  by  Le  Roux 
in  1862,  who  discovered  that  the  index  of  refraction  of  a  prism  containing  iodine  va¬ 
por  was  1.020  for  red  light  and  1.019  for  blue  light,  so  that  nuue  <  nred-  Within  10 
years,  Christiansen  noted  similar  behavior  in  an  analine  dye  that  exhibited  a  strong 
absorption  of  green  light,  normal  refraction  for  red,  orange,  and  yellow  light,  but  a 
smaller  refraction  angle  for  blue  light.  The  refractive  index  of  a  medium  that  exhib¬ 
ited  strong  absorption  was  seen  to  increase  rapidly  as  the  wavelength  was  decreased 
approaching  the  absorption  band.  It  took  some  time  to  produce  a  theory  of  matter 
and  light  that  explained  this  effect. 

The  reason  for  dispersion  is  due  to  the  interaction  of  light  with  matter.  Light  can 
be  absorbed  by  matter,  where  the  energy  of  the  light  wave  is  converted  to  energy  of 
some  form  in  the  matter  (e.g.,  it  may  increase  the  thermal  energy  in  the  matter.  Light 
also  can  be  scattered,  where  the  electric  charges  in  the  matter  (protons  in  the  nuclei  or, 
more  usually,  electrons  in  the  atomic  shells)  absorb  and  then  re-emit  electromagnetic 
waves.  If  the  scattering  is  elastic,  then  no  energy  is  transferred  to  the  medium,  all 
of  it  stays  in  the  electromagnetic  wave.  If  inelastic,  some  energy  is  transferred  to 
the  medium.  Scattering  generally  occurs  when  the  waves  encounter  a  structure  or 
obstacle  whose  dimension  is  smaller  than  a  wavelength,  and  the  wave  is  re-emitted 
into  a  new  direction.  Electromagnetic  waves  are  scattered  by  electric  charges  that 
may  be  bound  in  an  atom  or  free  (unbound).  A  material  like  glass  contains  many 
charges,  and  the  macroscopic  effect  is  the  sum  of  the  effects  from  each  individual 
charge. 

The  interactions  of  light  with  matter  are  characterized  by  two  numerical  factors: 
the  “absorption  coefficient”  a  and  the  (possibly  complex  valued)  “refractive  index” 
n.  which  both  may  be  functions  of  the  frequency  of  the  incident  light.  The  absorption 
is  due  to  transfer  of  energy  from  the  light  to  the  medium;  at  frequencies  where  the 
absorption  coefficient  is  small,  the  light  can  penetrate  the  matter  to  a  significant 
dept  h.  and  thus  the  matter  is  “transparent.” 

Light  impinging  on  a  medium  causes  the  charged  particles  (protons  in  atomic 
nuclei  or  electrons  in  atomic  shells)  to  vibrate  at  the  oscillation  frequency  of  the 
light.  These  accelerated  charges  emit  light  of  that  same  oscillation  frequency  in  turn 
(this  is  the  scattered  light).  The  relative  phase  of  the  incident  light  and  the  re-emitted 
light  determines  much  of  the  effect  of  the  medium  on  the  incident  light.  For  example, 
if  the  incident  and  scattered  waves  are  out  of  phase  by  =  180°  in  some  direction,  then 
the  light  beam  propagating  in  that  direction  will  be  attenuated. 

The  oscillations  damp  out  because  the  electrons  are  influenced  by  other  forces, 
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including  the  electric  forces  induced  by  neighboring  charges.  Thus  the  interaction 
of  light  with  the  medium  acts  like  a  damped  harmonic  oscillator  that  is  driven  by 
the  sinusoidal  force  induced  by  the  light  wave.  We  can  think  of  the  interaction  of 
light  with  charged  particles  in  matter  as  a  so-called  “driven”  or  “forced”  harmonic 
oscillator  whose  amplitude  decreases  with  time  due  to  the  damping.  The  charges  are 
“bound”  to  fixed  equilibrium  locations  and  can  oscillate  with  one  or  more  resonant 
frequencies  determined  by  the  internal  forces  due  to  neighboring  atoms.  The  electric 
charges  can  absorb  and  re-emit  the  light  (i.e.,  scatter  it)  in  ANY  direction.  If  the 
frequency  of  the  light  is  close  to  the  oscillation  frequency  of  one  of  the  resonant  states, 
then  some  of  the  electromagnetic  energy  is  retained  by  the  charge  and  not  scattered; 
it  instead  increases  the  energy  of  the  charge.  The  wave  then  loses  some  amplitude 
when  scattered.  If  the  frequency  of  the  light  is  far  from  a  resonant  frequency  of  the 
charges,  then  the  scattered  light  constructively  interferes  along  the  same  direction  as 
the  incident  light  and  it  can  pass  through  the  medium;  in  other  words,  the  medium 
is  transparent  for  light  with  those  frequencies.  Even  in  transparent  media,  the  phase 
of  the  re-emitted  light  generally  is  shifted  by  the  interaction  of  the  medium. 

The  first  question  to  consider  is  the  reason  why  the  light  is  “forward  scattered”  in 
transparent  media.  We’ve  actually  already  given  the  answer;  the  light  emitted  in  this 
direction  by  all  atoms  interferes  constructively  and  light  emitted  in  other  directions 
interferes  destructively.  The  charges  in  matter  that  scatter  the  light  may  be  viewed  as 
uniformly  distributed  and  quite  close  together  (separated  by  fractions  of  nanometers, 
significantly  less  than  visible  wavelengths).  The  phase  “lag”  of  the  forward- scattered 
light  is  equivalent  to  a  “slowing  down”  of  the  light  wave,  hence  the  index  of  refraction. 


6.11.1  Feynman’s  Model  for  Refractive  Index 


From  my  point  of  view,  the  best  discussion  of  refraction  and  dispersion  was  given  by 
Richard  Feynman  in  his  famous  Lectures  on  Physics,  Volume  I,  Chapter  31.  This 
discussion  is  derived  from  his. treatment. 

Consider  an  electromagnetic  wave  incident  on  a  thin  plate  of  glass.  The  source 
point  is  assumed  to  be  a  large  distance  away  (to  the  left)  and  the  observation  point 
also  is  a  large  distance  to  the  right  (thus  the  figure  is  “not  to  scale!”). 
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\  Evaluate  E  Here 

Glass  Plate 


Electric  waves  interacting  with  a  layer  of  transparent  glass:  some  of  the  field  is 
reflected  and  the  original  field  plus  a  “correction”  term  are  transmitted. 


The  electric  field  at  the  measurement  point  is  the  sum  of  the  original  electric  field 
plus  a  correction  term  E2  due  to  emission  by  the  electric  charges  in  the  glass  plate.  We 
can  assume  that  the  charges  are  electrons,  since  the  protons  are  much  more  massive. 
The  electrons  in  the  glass  oscillate  under  the  influence  of  the  incoming  electric  field 
from  the  source  and  thus  emit  their  own  electric  fields.  The  observed  field  includes 
contributions  from  these  charges.  These  modifications  occur  in  a  way  that  makes  the 
field  inside  the  glass  “appear”  to  be  moving  at  a  different  phase  velocity  -  this  is  the 
reason  why  the  index  of  refraction  of  glass  is  larger  than  one. 


Mathematical  Picture  of  Phase  Change  of  Light  Due  to  Glass 

Consider  one  electron  in  the  glass;  it  “feels”  the  effect  of  the  incident  field  and  the 
fields  generated  by  all  of  the  other  charges  in  the  glass,  and  the  motions  of  all  of  these 
other  charges  are  influenced  by  that  one  electron  that  we  are  observing.  To  simplify 
the  problem,  we  assume  that  the  influences  of  the  other  atoms  are  small  relative  to  the 
effect  of  the  source,  so  that  the  total  field  at  the  observation  point  is  little  affected 
by  the  motions  of  the  other  charges.  In  effect,  we  are  assuming  that  the  index  of 
refraction  of  the  glass  is  very  small  (close  to  one).  The  calculation  will  produce  a  field 
that-  travels  in  the  same  direction  as  the  incident  field  (E2)  and  a  field  that  travels 
in  the  opposite  direction  (E\  -  the  “reflected”  field),  but  the  latter  is  small  because 
n  =  1. 

Because  its  source  is  far  away,  the  incident  electric  field  is  a  traveling  plane  wave 
that-  may  be  written  in  complex  notation: 

Es  =  E0  exp  [+i  ( k0z  —  cut )]  =  E0  exp 

where  the  factoring  conveniently  leaves  the  distance  z.  Here  uj  is  the  angular  temporal 
frequency  of  the  driving  force,  and  thus  is  a  variable  in  the  problem  (rather  than  a 
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parameter  of  the  wave),  hence  it  does  not  have  a  subscript.  Assume  that  z  =  0  at 
the  “front”  (input)  side  of  the  plate  and  z  =  Az  at  the  back  side,  so  that  the  phase 
of  the  electric  field  at  the  front  of  the  plate  is: 

(j)  [0,  t]  =  —LUt 

If  there  were  no  glass,  then  the  phase  at  the  back  of  the  plate  observed  would  be 
identical  to  that  at  the  front  of  the  plate  observed  at  an  earlier  time  t': 


(j)  [A z,  vacuum]  =  cu 


A  z  \  Az 

- t  I  =  —LUt  +  CU - 

c  c 


where  the  phase  increment  is  due  to  the  delay  during  the  transfer  of  the  extra  distance 
Az.  If  the  glass  “slows  down”  the  light  so  that  the  velocity  in  the  glass  is  v  <  c,  then 
the  phase  at  the  rear  of  the  glass  will  include  an  additional  factor 


(Az  \  /Az 

(p  [Az,  t;  glass]  =  cu  I - t  I  >  cu  I  — - t 


If  we  substitute  the  index  of  refraction,  then  the  distance  parts  of  the  phase  are 
proportional 


(j)  [Az,  t;  glass]  =  cu 


n  ■  Az 


t  >  cu 


Thus  the  “additional”  phase  due  to  the  extra  time  to  travel  through  the  glass  is: 


n  ■  Az 


t  —  cu 


t  =  cu 


(n  —  1)  •  Az 


The  electric  field  at  the  back  of  the  glass  is: 


Eafter  =  E0  exp  +iLu  -  tjj  •  exp  +IUJ 


.  (n  —  1)  •  Az 


Thus  the  contribution  to  the  electric  field  due  to  the  glass  plate  may  be  interpreted 
as  an  additive  contribution  to  the  phase  instead  of  an  additive  contribution  to  the 
amplitude.  The  term  with  (n  —  1)  is  the  change  due  to  the  glass  and  will  be  evaluated 
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by  physical  arguments  in  a  moment.  It  may  be  expanded  in  a  Taylor  series: 

(+*0)°  i+iO)1  (+^)2  (+idf 


exp  [ +id  = 


exp 


.  (n  —  1)  •  Az: 
+iu- - - - 


0!  1!  2!  TV! 

.  .  (n  —  1)  •  Az:  (+T)2  f  (n  —  1)  •  Az: 

c  2!  V  c 


(+i)N  (  (n  -1)  ■  Az 

UJ - 


N 


N\ 

If  we  assume  that  the  glass  is  thin,  so  that  Az:  >  0,  then  we  can  ignore  all  terms  of 
order  two  or  larger: 


exp 


.  (n  —  1)  ■  Az: 
-\~iuj - 


(n  —  1)  ■  Az: 
^  1  +  iuj± - - 


and  thus  the  electric  field  at  a  large  distance  “behind”  the  glass  plate  is  approximately: 


Eafter  —  Es  +  E2  —  E0  exp 


+iuj  ( - 1 

c 


.  (n  —  l)  ■  Az 

•  1  +  IUJ- - - - 


=  ( E0  exp 


+icu  ( - 1 

c 


.  (n  -  1)  •  Az 
iuj - (  b0  exp 


+iuj  ( - 1 

c 


The  first  term  is  just  the  source  field  at  the  front  of  the  plate: 


Eq  exp 


+iuj  ( - 1 

c 


=  E„ 


and  the  second  term  is  identified  as  the  contribution  from  the  charges  within  the 
glass,  what  is  labeled  in  the  Figure  as  E2: 


+iuj  ((n  [Eq  exp  [+iu  (|  -  t)])  =  E2 


We  know  that  the  leading  factor  +i  =  exp  [+i|l,  which  indicates  that  the  electric 
field  from  the  charges  in  the  glass  is  out  of  phase  with  the  original  electric  field  by 
+  |  radians,  as  shown  on  the  vector  (phasor)  diagram  of  the  contributions  of  the  two 
fields: 

Imaginary 


£o+E2 


\  <|)  =  +  co0  (n  -  1)  Az  /  c 


Real 


Argand  diagram  of  phasor  contributions  from  incident  field  E0  and  field  due  to 
charges  in  the  glass  E2,  which  is  oriented  approximately  perpendicular  to  E0  and 
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“delays  ”  the  phase  of  the  electric  field. 


Physical  Picture  of  Electron  Oscillations 

If  the  field  E2  thus  evaluated  can  be  expressed  in  terms  of  the  oscillating  charges  in 
the  glass,  then  we  will  have  explained  the  behavior  of  the  refractive  index.  Again,  we 
assume  that  the  incident  field  has  the  form  of  a  plane  wave: 


Es  [z,  t]  =  E0  exp  [+i  ( k0z  —  cut)} 

At  the  “front  edge”  of  the  glass  (z  =  0),  the  field  is  the  same  as  given  before: 

Es  [0,  t]  =  E0  exp  [— icut] 

The  electrons  in  the  glass  “feel”  this  electric  field  and  are  driven  in  the  same  direction 
by  the  force: 

F  =  eEs  [0,  t]  =  eE0  exp  [—icut] 

(there  also  is  a  magnetic  field,  but  its  effect  on  the  electrons  is  so  much  smaller  that 
it  can  be  ignored).  The  electrons  have  mass  m  and  act  as  though  bound  to  the  atoms 
by  little  springs  that  exert  restoring  forces  proportional  to  the  distance  of  the  electron 
from  its  equilibrium  position.  The  restoring  force  on  the  electron  position  has  the 
form: 

F  =  —k  ( x  —  Xq) 

The  electron  oscillates  at  its  “normal  frequency,”  which  is: 


this  is  a  parameter  of  the  electron  +  spring  system  (and  thus  has  a  subscript).  The 
equation  of  motion  of  the  electrons  in  the  glass  is: 


d2x 

dt2 


+  iticUqX  =  F  =  eE0  exp  [—icut] 


where  the  last  term  is  the  “driving  force”  due  to  the  electric  field  with  the  variable 
angular  temporal  frequency  cu.  We  solve  this  equation  by  standard  methods  of  differ¬ 
ential  equations;  we  assume  that  the  position  x  of  each  electron  also  oscillates  about 
its  equilibrium  point  at  the  same  rate.  The  amplitude  of  the  oscillation  is  assumed 
to  be  x0 : 

x  =  x0  exp  [—icut] 

The  derivatives  needed  in  the  equation  of  motion  are 


dx 

dt 

d2x 

dt2 


—ioux o  exp  [—icut]  =  —icux 
(—  icu)2  a’o  exp  [— icut]  =  —cu2x 
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So  the  equation  of  motion  is  simplified  to: 


m  (— u2xq)  exp  [— iut]  +  mu^xo  exp  [— iut]  =  eE0  exp  [— iut] 
=>  [—mu2  +  tocUq)  xq  =  eE0 


And  thus  the  amplitude  of  the  electron  displacement  from  equilibrium  may  be  ex¬ 
pressed  in  terms  of  the  “normal”  oscillation  frequency,  mass,  electric  charge,  and  the 
amplitude  and  oscillation  frequency  of  the  incident  electric  field: 

_  eE0 
X°  m  (cug  —  cu2 

Therefore  the  motions  of  EACH  individual  charge  in  the  thin  glass  plate  due  to  the 
incident  electric  field  is  the  simple  expression: 


x  [£]  =  a’o  exp  [—iut] 


eE0 

m  (cUq  —  cu2) 


exp  [— iut] 


Note  that  this  does  not  include  the  initial  positions  of  the  charges,  which  are  (obvi¬ 
ously)  different  for  each. 

We  now  must  calculate  the  field  at  the  observation  point  (distant  from  the  plate) 
due  to  a  “thin  plane”  of  charges  that  all  move  with  this  same  motion  x  [t] .  We  find 
the  field  at  the  observation  point  by  adding  the  contributions  from  each  of  the  charges 
in  the  glass.  The  electric  field  radiated  by  each  electron  in  the  glass  is  proportional 
to  the  acceleration  just  evaluated: 


cl2x 
dt 2 


(—feu)2  a’o  exp  [—iut]  = 


-uj2x 


The  electric  field  at  large  distances  from  the  charge  that  oscillates  perpendicularly  to 
the  distance  decreases  approximately  as  the  reciprocal  of  the  distance  and  includes 
the  time  delay  for  the  field  to  arrive: 


Ee  [r,  t]  ^  e-  (-cu2a;o)  exp 


(t  ~  o 


We  assume  that  the  observation  point  is  so  far  away  that  the  field  oscillates  approxi¬ 
mately  perpendicular  to  the  “line  of  sight.” 

The  total  field  at  the  observation  point  is  the  vector  sum  of  the  contributions  from 
the  individual  electrons  in  the  thin  glass  plate.  These  contributions  may  be  integrated 
in  polar  coordinates.  If  //  is  the  number  density  of  the  electrons  per  unit  area  in  the 
glass,  then  the  electric  field  is: 


Eall 


r>p=+oo 


I  p= 0 


-  (— ou2Xo)  exp 


— feu  (  t  - 

c/ J 


=  2n rje  [—cu2x0)  exp  [—iut] 


f‘p=+0 o 


exp 


i  p= o 


•  rj  ■  2n p  ■  dp 
P 


+iu- 


dp 
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where 


Therefore: 


2  2,2 
r  =  p  +  z 


r  dr  =  p  dp 


rp=+o o  ri  p  rr=+o o 

/  exp  +iu>-  ■  -  ■  dp  =  / 

I  p= 0  ^  cJ  r  J  r=z 


exp  +iu>-  ■  -  ■  dr 
L  cJ  r 

'  .  n 

exp  -Hu;-  •  c/r 
L  c. 


c  /  r 

=  —  ( exp  [+i  •  ool  —  exp  +iu>- 
iuj  V  L  c 


Note  that  the  term  exp  [+i  ■  oo]  oscillates,  and  rapidly,  so  we  can  assume  it  to  be  zero: 

c  r  z~ 

Eau  =  2nrje  (—oj2x0)  exp  \—iut\  ■  — -exp  +iu- 
v  '  iuj  L  cJ 


=  ( 2irr]cexo )  •  (— iu)  exp  —  ioj  (t  — -  j 

r  z  ~ 

=  (2n pee)  ■  (—icu)  xq  exp  [— iut]  exp  +ioj- 


which  shows  that  the  field  measured  at  the  observation  point  due  to  all  of  the  oscil¬ 
lating  charges  is  out  of  phase  by  |  radians  and  delayed. 

All  we  need  to  do  now  is  substitute  the  formula  for  ay  from  the  driven  harmonic 
oscillator  that  was  derived  in  the  previous  section: 


E2  =  ( 2npce )  •  (— iu) 


m  (cUq  —  uj2) 


r  .  ,  [  ■  Z' 

exp  [—iut\  exp  +iuj- 


I E2  =  exp  [• +iuj  (~c  -  01 


In  words,  the  electrons  in  the  glass  that  oscillate  due  to  the  incident  field  emit  a  wave 
that  travels  in  the  same  direction  (towards  z  =  +oo).  The  amplitude  of  the  wave 
is  proportional  to  the  number  of  atoms  rj  and  to  the  strength  of  the  source  field  E0. 
This  field  resembles  that  evaluated  before  (in  the  earlier  box) 

„  (n  —  1)  ■  Az  /  r  fz  \i\ 

E2  =  —icu- - - - \^E0  exp  +ioj  y  -  —  tj  J 

These  two  are  equal  if  we  identify  that  some  of  the  factors  are  identical: 

(n  —  1)  •  ( 2nrjce 2) 

c  m  (cUq  —  lu2) 

(27 rr]c2e2) 

=>  C n  ~l)Az=  — ^ 

m  (Uq  —  uj2) 

(2nc2e2)  p 

- K  n  =  1  J _ 1 _ i _  . 

m  (cuq  —  uj2)  A z 


n  =  1 
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We  now  define  N  to  be  the  number  density  of  electrons  per  unit  volume  in  the  glass, 
which  can  be  expressed  as  the  product  of  the  number  density  rj  per  unit  area  in  the 
thin  sheet  and  the  thickness  Az: 

N  =  ri- Az  = -?-  =  N 

1  Az 

Thus  the  index  of  refraction  can  be  written  as: 


n  =  1  + 


27 riVc2e2 
m(  lOq—ui2^ 


1  + 


where  the  constant  a  includes  the  contributions  from  the  number  density,  the  mass, 
the  charge  on  the  electron,  and  the  velocity  of  light.  This  is  the  frequency  dependent 
index  of  refraction  in  the  simple  model  of  a  thin  sheet  of  glass  with  oscillating  bound 
electrons.  This  expression  is  graphed  for  the  case  cu0  =  2  and  a  =  1,  showing  that  the 
index  is  largest  for  light  with  temporal  frequency  cu  =  cu0,  i.e.,  in  the  vicinity  of  the 
frequency  of  the  “resonant”  oscillations  of  the  electrons  due  to  the  restoring  forces. 
In  this  simple  model,  the  index  of  refraction  can  be  less  than  unity,  which  indicates 
that  this  picture  is  incomplete. 


Graph  of  n  vs.  u  assuming  that  ujq  =  2  and  27rNnfe2  =  1,  showing  that  the  index 
generally  increases  with  increasing  us,  which  means  that  n  decreases  with  increasing 

wavelength;  this  is  normal  dispersion. 

In  thick  glass,  the  fields  from  different  thin  sheets  interact  which  increases  the 
mean  refractive  index.  Also  the  oscillations  of  the  electrons  are  actually  “damped” 
out  by  other  forces  in  the  glass.  The  dampling  pushes  the  extrema  of  n  towards  its 
mean  value.  To  derive  this  behavior,  we  could  continue  the  discussion  in  the  same 
vein  (most  books  do,  including  Feynman  and  Hecht).  We  could  follow  this  line  of 
reasoning,  but  we  can  demonstrate  a  useful  connection  to  linear  systems  theory  and 
thus  simplify  the  derivation. 
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We  can  think  of  the  action  of  light  on  matter  as  a  linear  shift-invariant  system 
with  a  “causal”  impulse  response,  i.e.,  a  temporal  system  whose  impulse  response  is 
zero  for  t  <  0  and  thus  cannot  “respond”  until  stimulated.  An  appropriate  impulse 
response  for  a  “causal”  damped  oscillating  system  is: 

h  [£]  =  A0  exp  [— 70 1]  ■  STEP  [£]  •  sin  [27T u0t] 

=  A0  exp  [— 7q£]  •  STEP  [£]  •  cos  27T u0t  —  — 

Zj  - 

where  70  is  the  damping  coefficient  (i.e.,  the  reciprocal  of  the  time  required  for  the 
output  response  to  decrease  by  e_1  =  0.368),  uq  is  the  natural  oscillating  frequency 
of  the  charged  particle  (the  electron),  and  7o  is  the  initial  phase.  This  function  h  [t] 
measures  the  “response”  of  the  system,  which  in  our  case  is  the  position  of  the  electron 
relative  to  its  equilibrium  position.  The  STEP  function  ensures  that  h  [t  <  0]  =  0, 
meaning  that  the  electron  “sits”  at  its  equilibrium  position  until  disturbed;  this  forces 
the  system  to  be  causal.  When  stimulated  by  a  “pulse  of  light,”  (modeled  by  a 
Dirac  delta  function  S  [t] ) .  the  position  of  the  electron  increases  from  zero  following  a 
sinusoidal  curve,  but  the  amplitude  of  the  sine  wave  decreases  with  time  due  to  the 
decaying  exponential,  as  shown: 


t 

Impulse  response  of  damped  harmonic  oscillator: 
h  [t]  =  exp  [— yt\  ■  STEP  [t]  ■  sin  [2irv0t\  where  A0  =  1,  u0  =  2,  and  70  = 


The  response  of  this  system  is  characterized  by  the  transfer  function  of  the  impulse 
response,  which  is  its  1-D  temporal  Fourier  transform.  We  use  the  known  Fourier 
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transforms: 


Ei  {e-*  ■  STEP  [t]}  = 


1  1  +  i  (—27 tv) 

1  +  2mv  1  +  (27 tv)2 


Ei  {sin  [2nv0t]}  =  i  ■  — — -  (S  [v  +  v0]  -  5  [v  -  v0]) 

2  \  Vq  | 

and  the  scaling  theorem: 

{/[*]}  =  *>]  =>  T,  {/[to(]}  = 
to  derive  the  transfer  function  of  this  system: 


To 


-F 

V 

.To. 

H  [v\  =  AqEi  {sin  [27t v0t]  ■  (exp  [— 70£]  •  STEP  [t])} 

=  AnTi  {sin  [27 rv0t]  •  (exp  [-jot]  ■  STEP  [70^])} 

=  *  [ hTu)  ^  ^  +  iyo\-S[u-  vo])  * - -y 

V2  Ivol  /  l  +  2mUA 


=  i 


l-2ni(  — 

70 


4  \  l  —  ATT 


27TiA 
70  ) 


Re  {H  [v]} 


Im  {H  [ v]} 


AqTT 

( 

v  +  v0 

To2 

V1  + 

Ao  1 

W  +  LOo 

2jo  ' 

vTo  + 

(w  +  Wo)" 

Ao  1 

/ 

1 

2to  \ 

,To2  + 

(w  +  Wo)2 

V  -Vo 


2tt 


V— VQ 
70 


W  —  W0 


7o  +  (cu  -  Wo)" 

1 

7 o  +  (<*>  -  ^o)' 


The  transfer  function  acts  as  a  frequency-dependent  “scale  factor”  applied  to  the 
amplitude  of  the  electron  oscillation.  The  graphs  of  the  real  part,  imaginary  part, 
magnitude,  and  phase  are  shown  below,  where  the  domain  is  assumed  to  include 
negative  temporal  frequencies.  In  this  example,  v0  =  2  and  70  =  0.5. 
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In  words,  the  transfer  function  measures  the  “response”  of  the  system,  i.e.,  the  am¬ 
plitude  of  the  oscillation  of  the  charged  particle,  which  we  called  x  [t]  before. 


x  [t]  oc  xq  •  (Re  {H  [u]}  +  i  Im  {H  [z/]})  •  exp  [—iust] 


R-O^-'O  r  -  ,-1 

~7T T  eXP  i-lUjt\ 
z7o 


/  uj  +  cu0  +  i'y 

uj  —  cuo  —  i'y  ^ 

.  \7o  +  (cu  Tcuq)2 

7o  +  (^  -cu0)2/. 

The  amplitude  is  again  a  function  of  the  temporal  frequency  of  the  incident  light. 
Note  that  the  phase  of  the  transfer  function  is  approximately  0  radians  for  \is\  <  |z/0|, 
approximately  —7 r  radians  for  is  >  is{).  and  —  |  radians  if  the  frequency  of  the  incident 
light  is  +z/0.  This  means  that  the  system  response  (the  oscillation  of  the  charged 
particle)  is  “in  phase”  if  the  frequency  of  the  incident  light  is  less  than  the  “resonant 
frequency”  of  the  oscillating  charge.  The  oscillation  of  the  charged  particle  is  “out  of 
phase”  if  the  frequency  of  the  incident  light  is  larger  than  the  resonant  frequency.  Also 
note  from  the  magnitude  that  the  system  response  is  quite  large  near  resonance,  which 
means  that  the  oscillation  of  the  charged  particle  is  large.  Of  course  the  meaning  of 
light  with  a  negative  temporal  frequency  is  not  very  clear  in  this  context,  and  can  be 
rectified  by  recognizing  that  the  the  real  and  imaginary  part  of  the  response  of  the 
system  are  related  due  to  causality,  a  fact  reflected  in  the  Kramers-Kronig  equations, 
which  are  beyond  the  scope  of  this  discussion. 

The  index  of  refraction  is  proportional  to  this  amplitude  plus  a  bias,  so  we  can 
use  the  graphs  of  H  [is]  to  understand  the  frequency  behavior  of  n.As  mentioned 
previously,  the  index  of  refraction  decreases  with  increasing  wavelength  (increases 
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with  increasing  temporal  frequency),  thus  the  phase  velocity  in  a  medium  of  light 
with  longer  wavelengths  is  larger  and  the  dispersion  is  normal.  The  phase  velocity 
of  the  modulation  wave  (the  group  velocity )  is  less  than  the  phase  velocity  of  the 
average  wave,  and  messages  travel  more  slowly  than  the  carrier  wave  that  conveys 
the  message. 

In  the  vicinity  of  an  absorption  due  to  the  resonance  of  charged  particles  in  the 
medium,  the  index  of  refraction  increases  with  increasing  wavelength  over  a  small 
range,  which  means  that  shorter  wavelengths  travel  faster  and  the  dispersion  is  anom¬ 
alous.  In  this  region,  the  phase  velocity  of  the  modulation  wave  is  larger  than  the 
phase  velocity  of  the  average  wave.  This  implies  that  messages  can  travel  faster  than 
the  velocity  of  light.  HOWEVER,  since  this  only  happens  where  light  is  absorbed, 
the  message  cannot  propagate. 


-Q 

£=  < 


to  § 
CL  ^ 


Real  and  imaginary  parts  of  refractive  index  in  vicinity  of  a  “weak  ”  absorption. 


A_a_a_ 

“i 


Real  and  imaginary  parts  of  refractive  index  for  multiple  absorptions.  Note  that 
anomalous  dispersion  only  occurs  in  vicinity  of  absorptions,  so  that  light  cannot 
propagate  (from  Fowles,  Introduction  to  Modern  Optics,  Dover,  1975). 
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6.12  Dual  Nature  of  Light:  Photons 

In  many  contexts,  the  “particle”  picture  of  light  is  more  appropriate.  In  imaging,  for 
example,  consider  images  created  with  different  exposure  times.  Photographs  taken 
in  with  shorter  exposures  generally  look  grainier: 


Images  created  from  increasing  numbers  of  photons,  showing  increase  in 

signal-to-noise  ratio. 

A  “particle  of  light”  is  the  photon ,  whose  energy  is  proportional  to  the  temporal 
frequency: 

E  =  hu  =  h— 

A 

=  flOJ 

where  h  is  Planck’s  constant,  which  is  often  normalized  by  a  factor  of  27T,  called 
“h-bar:” 

h  =  6.625  x  10”34  J  -  s  =  6.625  x  10"27  erg  -  s 

h=  —  ^  1.054  x  10-34  J  —  s 
2vr 

If  A  =  550  nm,  the  energy  per  photon  is  only: 

,  ,  3  x  108^  1Q 

E  =  (6.625  x  10~34  J  -  s) - s_  =  3.6  x  10~19  J 

550  nm 

The  “photon  flux”  is  the  number  of  photons  per  second  in  a  light  beam: 

P 

$  =  - — ,  where  P  is  the  power 
hu 

Typical  fluxes  per  unit  area  for  some  sources  are  shown  in  the  table 


108CHAPTER  6  MAXWELL’S  EQUATIONS  FOR  ELECTROMAGNETIC  WAVES 


Light  Source 

$/A  in  photons2 

'  sec  —  mz 

focused  laser 

1026 

unfocused  laser 

1021 

bright  sunlight 

1018 

indoor  light 

1016 

twilight 

1014 

moonlight 

1012 

starlight 

1010 

The  pattern  of  photon  arrivals  tells  something  about  the  source.  Random  (inco¬ 
herent)  light  sources  (such  as  light  bulbs)  emit  photons  with  random  arrival  times 
and  a  Bose-Einstein  distribution.  Coherent  light  sources,  on  the  other  hand,  emit 
photons  with  a  Poisson  distribution,  which  is  more  uniform  but  still  random. 


6.12.1  Momentum  of  Photons 

Atoms  that  emit  photons  “recoil”  in  the  opposite  direction,  and  surfaces  that  absorb 
photons  also  recoil.  The  momentum  of  a  single  photon  is 

P=h-  =  hk 
A 

The  pressure  due  to  radiation  is  the  force  per  unit  area,  which  is  equal  to  the  energy 
per  unit  volume,  or  the  energy  density.  Radiation  pressures  are  often  neglected,  but 
cannot  be  if  the  mass  is  small  or  the  flux  is  large,  e.g.,  in  the  motion  of  comet  tails 
or  spacecraft,  in  stellar  interiors,  and  in  the  light  of  lasers. 
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6.13  Optical  Frequencies  —  Detector  Response 


The  general  equation  for  a  traveling  electromagnetic  wave  is: 


y  [z,  t]  =  A0  cos  [kz  =f  cut]  =  A0  cos 


2n  (  —  —  vt 
A 


=  A0Re{ei{kz*“t)} 


We  see  electromagnetic  radiation  with  detectors,  i.e.,  devices  which  respond  in 
some  way  to  incident  electromagnetic  radiation.  The  human  eye  is  sensitive  only  to 
visible  light,  i.e.,  light  with  wavelengths  in  the  range  400  nm  <  A  <  700  nm.  This  is 
not  the  case  for  all  life,  however.  The  pit  viper  can  see  radiation  emitted  by  humans 
at  a  wavelength  of  about  10/jni;  it  needs  special  receptors  on  the  sides  of  its  head  to 
do  this. 


As  shown  in  the  plot  of  the  electromagnetic  spectrum,  the  frequencies  of  visible 
wavelengths  are  quite  large:  v  ~  1015  Hz.  The  temporal  period  of  an  optical  wave  is 
therefore  T  =  v-1  ~  10  iri  s.  Human  visual  receptors  cannot  respond  fast  enough  to 
detect  the  periodic  oscillation  of  the  wave  amplitude;  we  see  an  invariant  brightness. 
Note  that  this  limitation  exists  for  all  detectors  of  visible  radiation  (e.g.,  photographic 
film,  light  meters,  etc.);  they  all  respond  to  the  average  brightness.  The  same  is 
true  for  hearing;  your  ear  cannot  detect  the  variation  of  sound  pressure  due  to  the 
oscillation  at  frequencies  above  a  few  Hz.  Because  water  waves  have  a  much  lower 
frequency,  the  amplitude  and  phase  of  the  wave  can  be  measured.  Similarly,  the 
phase  can  be  measured  of  electromagnetic  waves  that  have  a  much  smaller  temporal 
frequency,  e.g.,  radio  waves. 


The  average  amplitude  of  a  sinusoidal  wave  is: 

{yM)  =  ifr[  y[z,t]dt 
1d  J  0 

A0  cos  [kz  —  cut]  dt 

^0  [■/  ,i  \t=Td 

=  —  ~^rsm  [kz  —  ut\  |t=0d 


Since  y  [z,  t]  is  sinusoidal,  the  average  value  of  the  wave  will  tend  to  zero  unless  Td  is 
smaller  than  the  wave’s  temporal  period.  However,  the  intensity  (squared-magnitude) 
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of  the  wave  does  not  average  to  zero: 

£  oc  (y2[z,t})  =  [  y2  [z,  t]  clt 

J-d  Jo 

1  fTd 

=  —  A20  cos2  [kz  —  cut]  clt 

Td  Jo 

A2  fTd 

=  — p  /  cos2  [kz  —  cut]  clt 

±d  Jo 

3  Td  = 

“  Td  2  2 

=>  £  oc  (y2  [z,t])  =>•  if  Td»v~1 
because  the  average  value  of  cos2  [x]  = 

Detectors  of  visible  light  are  sensitive  to  time-averaged  intensity,  not  amplitude. 


Chapter  7 


Propagation  of  Light  Waves 


7. 1  Wavefront  s 


7.1.1  Plane  Waves 


The  form  of  any  wave  (matter  or  electromagnetic)  is  determined  by  its  source  and 
described  by  the  shape  of  its  wavefront,  i.e.,  the  locus  of  points  of  constant  phase.  If 
a  traveling  wave  is  emitted  by  a  planar  source,  then  the  points  of  constant  phase  form 
a  plane  surface  parallel  to  the  face  of  the  source.  Such  a  wave  is  called  a  plane  wave, 
and  travels  in  one  direction  (ideally).  Since  energy  is  conserved,  the  total  energy 
in  the  wave  must  equal  the  energy  emitted  by  the  source,  and  therefore  the  energy 
density  (the  energy  passing  through  a  unit  area),  is  constant  for  a  plane  wave.  Recall 
that  in  a  wave  of  amplitude  A  and  frequency  uj,  the  energy  £  oc  A2ui2.  Therefore,  for 
a  plane  wave,  the  amplitude  is  constant;  the  wave  does  not  attenuate. 

Plane  wave  toward  z  =  +oo  at  velocity  v^  =  |,  wavelength  A  =  Ar,  frequency 
v  =  j-,  amplitude  A0 : 


f[x,  y,  z ,  t\  =  A0  cos  [kz  =F  wt] 

(n.b.,  no  variation  in  y  or  z) 

General  3-D  plane  wave  traveling  in  a  direction  k=  [kx,  ky ,  kz] ,  r=  [x,  y,  z\  and  the 
definition  of  the  scalar  product  (dot  product): 

/  [r,  t]  =  A0  cos  [k«r  —  cut]  ==>  k •r_=kxx  +  kyy  +  kzz 
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z 


moving  along  z-axis 


7.1.2  Cylindrical  Waves 

If  a  wave  is  emitted  from  a  line  source,  the  wavefronts  are  cylindrical.  Since  the  wave 
expands  to  fill  a  cylinder  of  radius  r0,  the  wavefront  crosses  a  cylindrical  area  that 
grows  as  Area  =  2nrh  oc  r.  Therefore,  since  energy  is  conserved,  the  energy  per  unit 
area  must  decrease  as  r  increases: 


£ 

Area 


=  constant  = 


£  £  Al 

- -  oc  —  oc  — 

2irrh  r  r 


=  constant 


amplitude  oc  — = 

Vr 


The  equation  for  a  cylindrical  wavefront  emerging  from  (or  collapsing  into)  a  line 
source  is: 


f[x,  y ,  z,t]  =  A  [r]  cos  [kr  =|=  cot] 

=  — ^cos  [kr  =F  ut]) 
Vr 

r  =  \J x2  +  y2  >  0 
“  —  ”  =>•  emerging 
“  +  ”  =>  collapsing 

A0  =  amplitude  at  r  =  0 


7.1  WAVEFRONTS 
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Cylindrical  waves  expanding  from  a  line  source. 


7.1.3  Spherical  Waves 


The  wavefront  emerging  from  (or  collapsing  into)  a  point  is  spherical.  The  area  the 
wave  must  cross  increases  as  x2  +  y2  +  z2  =  r2  (area  of  sphere  is  Anr2).  Therefore  the 
energy  density  drops  as  r2and  the  amplitude  of  the  wave  must  decrease  as  K  The 
equation  for  a  spherical  wave  is 


/  [x,  y,  z,t\  =  f  [r,  t]  =  A  [r]  cos  [hr  =p  wt\ 


Ap 

r 


cos  [hr  =F  cut] ,  where  r  >  0 


a 

u 


—  ”  =>  emerging 
+  ”  ==>  collapsing 
Ap  =  amplitude  at  r 


0 


Note  the  pattern  for  the  amplitude  of  plane,  cylindrical,  and  spherical  waves: 


plane  wave  =>2-D  source  (plane)  =^amplit-ude  A  [r]  oc  r‘  °  =  1 
cylindrical  wave  =>  1-D  source  (line)  ==>  A  [r]  oc 
spherical  wave  =>  0-D  source  (point)  A  [r]  oc  r-1 
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Spherical  waves  expanding  from  a  point  source. 


7.2  Huygens’  Principle 

I,  §1,  §3 

The  spherical  wave  is  the  basic  wave  for  light  propagation  using  Huygens’  prin¬ 
ciple.  In  1678,  Christiaan  Huygens  theorized  a  model  for  light  propagation  that 
claimed  that  each  point  on  a  propagating  wavefront  (regardless  of  “shape”)  could  be 
assumed  to  be  a  source  of  a  new  spherical  wave.  The  sum  of  these  secondary  spherical 
“wavelets”  produced  the  subsequent  wavefronts.  Huygens’  principle  had  the  glaring 
disadvantage  that  these  secondary  spherical  wavefronts  propagated  “backwards”  as 
well  as  forwards.  This  problem  was  later  solved  by  Fresnel  and  Kirchhoff  in  the  1 9th 
century.  With  that  correction,  the  Huygens’  model  provides  a  very  useful  model  for 
light  propagation  that  naturally  leads  to  expressions  for  “diffracted”  light. 


Chapter  8 

Interaction  of  Light  and  Matter 


8.1  Electromagnetic  Waves  at  an  Interface 

A  beam  of  light  (implicitly  a  plane  wave)  in  vacuum  or  in  an  isotropic  medium  propa¬ 
gates  in  the  particular  fixed  direction  specified  by  its  Poynting  vector  until  it  encoun¬ 
ters  the  interface  with  a  different  medium.  The  light  causes  the  charges  (electrons, 
atoms,  or  molecules)  in  the  medium  to  oscillate  and  thus  emit  additional  light  waves 
that  can  travel  in  any  direction  (over  the  sphere  of  47T  steradians  of  solid  angle).  The 
oscillating  particles  vibrate  at  the  frequency  of  the  incident  light  and  re-emit  energy 
as  light-  of  that  frequency  (this  is  the  mechanism  of  light  “scattering”).  If  the  emit¬ 
ted  light-  is  “out-  of  phase”  wit-h  the  incident-  light-  (phase  difference  =  ±7r  radians), 
then  the  two  waves  interfere  destructively  and  the  original  beam  is  attenuated.  If  the 
attenuation  is  nearly  complete,  the  incident-  light-  is  said  t-o  be  “absorbed.”  Scattered 
light-  may  interfere  constructively  with  the  incident-  light-  in  certain  directions,  forming 
beams  that-  have  been  reflected  and/or  transmitted.  The  constructive  interference  of 
the  transmitted  beam  occurs  at  the  angle  that-  satisfies  Snell’s  law;  while  that-  after 
reflection  occurs  for  ^reflected  =  incident  •  The  mathematics  are  based  on  Maxwell’s 
equations  for  the  three  waves  and  the  continuity  conditions  that-  must  be  satisfied 
at  the  boundary.  The  equations  for  these  three  electromagnetic  waves  are  not-  diffi¬ 
cult-  t-o  derive,  though  the  process  is  somewhat-  tedious.  The  equations  determine  the 
properties  of  light-  on  either  side  of  the  interface  and  lead  t-o  the  phenomena  of: 

1.  Equal  angles  of  incidence  and  reflection; 

2.  Snell’s  Law  that-  relates  the  incident  and  refracted  wave; 

3.  Relative  intensities  of  the  three  waves; 

4.  Relative  phases  of  the  three  light-  waves;  and 

5.  States  of  polarization  of  the  three  waves. 

For  simplicity,  we  consider  only  plane  waves,  so  that  the  different  beams  are 
specified  by  single  wavevectors  kn  that  are  valid  at-  all  points  in  a  medium  and  that 
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point  in  the  direction  of  propagation.  The  lengths  of  the  wavevectors  are  determined: 


2vr  ^  n 

-  =  2ny 

An  Ao 


where  Ao  is  the  wavelength  in  vacuum  and  An  is  the  wavelength  in  the  medium.  The 
interface  between  the  media  is  assumed  to  be  the  x  —  y  plane  located  at  z  =  0.  The 
incident  wavevector  k0,  the  reflected  vector  kr,  the  transmitted  vector  k,  and  the  unit 
vector  n  normal  to  the  interface  are  shown: 


z 


The  k  vectors  of  the  incident,  reflected,  and  “ transmitted'’  (refracted)  wave  at  the 
interface  between  two  media  of  index  n\  and  n2  (where  n2  >  rii  in  the  example 

shown). 


The  angles  (Jo ,  0r ,  and  6t  are  measured  from  the  normal,  so  that  90,  9t  >  0  and  9r  <  0 
as  drawn. 

The  incident  and  reflected  beams  are  in  the  same  medium  (with  n  =  nfl)  and  so 
have  the  same  wavelength: 

u  o  27rni 

vi  A0 


Ai  = 


27rni  27rni 

k0|  lkr 


The  wavelength  of  the  transmitted  beam  is  different  due  to  the  different  index  of 
refraction: 

.  27T  n2 

a2  — 


k t 
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As  drawn,  the  normal  to  the  surface  is  specified  by  the  unit  vector  perpendicular 
to  the  interface;  in  this  case,  it  points  in  the  direction  of  the  positive  z-axis: 


n= 


0 

0 

-1 


(we  could  have  defined  n  in  the  opposite  direction). 

The  incident  electric  field  is  a  sinusoidal  oscillation  that  may  be  written  in  complex 
notation: 

—incident =— 0  exP  [+*  fc  •  I  -  U0t)] 

where  r=  [x,  y,  z ]  is  the  position  vector  of  the  location  where  the  phase  k0  •  r  —  cu0t 
is  measured;  note  that  the  phases  measured  at  all  positions  in  a  plane  perpendicular 
to  the  incident  wavevector  k0  must  be  equal  (because  this  is  a  plane  wave). 

The  reflected  and  transmitted  waves  have  the  general  forms: 

E reflected  =  exp  [+*  (k,r  •  r  -  UJrt  +  fa)} 

E transmitted  =  exp  [+*  (k*  •  T  -  Utt  +  <j)t)] 

where  we  have  yet  to  demonstrate  that  cur  =  ut  =  oj0.  The  constants  fa  and  fa  are 
the  (perhaps  different)  initial  phases  of  the  reflected  and  transmitted  waves. 


8.1.1  Snell’s  Law  for  Reflection  and  Refraction 

One  boundary  condition  that  must  be  satisfied  is  that  the  phases  of  all  three  waves 
must  match  at  the  interface  (z  =  0)  at  all  times. 

(k0  •  r  —  u;0t)|z=0  =  (kr  •  r  -  ut  +  </>r) |*=0  =  fe  •  L  ~  ut  +  fa) |z=0 

This  equivalence  immediately  implies  that  the  temporal  frequencies  of  the  three 
waves  must  be  identical  (cuo),  because  otherwise  the  phases  would  change  by  dif¬ 
ferent  amounts  as  functions  of  time.  In  words,  the  temporal  frequency  is  invariant 
with  medium,  or  the  “color”  of  the  light  does  not  change  as  the  light  travels  into  a 
different  medium.  Therefore  the  spatial  vectors  must  satisfy  the  conditions: 

(k0  •  I ) 1 2=0  =  (kr*L  +  fa)\z=0  =  (kf  •  r  +  fa)\z=0 

Since  the  scalar  products  of  the  three  wavevectors  with  the  same  position  vector  r 
must  be  equal,  then  the  three  vectors  k0,  k,r  and  k,  must  all  lie  in  the  same  plane  (call 
it  the  x-z  plane,  as  shown  in  the  drawing).  The  number  of  waves  per  unit  length  at 
any  instant  of  time  must  be  equal  at  the  boundary  for  all  three  waves,  as  shown: 


(M.  =  (M,  =  (M, 


118 


CHAPTER  8  INTERACTION  OF  LIGHT  AND  MATTER 


The  x-components  of  the  threee  wavevectors  (for  the  incident,  reflected,  and 
transmitted  refracted  waves )  must  match  at  the  interface  to  ensure  that  each 
produces  the  same  number  of  waves  per  unit  length. 


From  the  definitions  of  the  vectors  we  can  also  see  that: 


k0 

cos 

s-*-: 

= 

ko 

kr 

cos 

.2  “  9r. 

kr 

where  the  factor  of  —1  on  the  reflected  angle  is  because  the  angle  measured  from  the 
normal  is  clockwise,  and  hence  negative.  The  equality  of  the  lengths  of  the  incident 
and  reflected  wavevectors  immediately  demonstrates  that: 


{ko)x  =  ( kr)x  =  |k0|  sin  [0O]  =  |kr|  sin  [~9r\ 
=>•  |k0|  sin  [0O]  =  |k0|  sin  [—  9r\ 

=>•  sin  [0O]  =  sin  [—9r] 

=k  0o  =  ~9r 


In  words,  the  angle  of  reflection  is  equal  to  the  negative  of  the  angle  of  incidence.  We 
usually  ignore  the  sign  of  the  angle  and  say  that  the  angles  of  incidence  and  reflection 
are  equal. 


Now  make  the  same  observation  for  the  transmitted  wave: 


(ko)x 

( kt)x 


kol  sin  [0O] 


27TOi 

^0 


sin  [0O] 


kt  | cos 


kf|  sin  [9f\ 


2nn2 
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We  equate  these  to  derive  the  relationship  of  the  angles  of  the  incident  and  transmitted 
wavevectors: 


2nn\ 

Ao 


sin  [9q 


2irn2 


An 


■  sm  \Vt\ 


n\  sin  [d0]  =  ^2  sin  [9t\ 


We  recognize  this  to  be  (of  course)  Snell’s  law  for  refraction. 

The  reflection  law  may  be  cast  into  the  form  of  Snell’s  refraction  law  by  assuming 
that  the  index  of  refraction  is  negative  for  the  reflected  beam: 


rii  sin  [d0  = 


-n\  sin  [9r] 

>  sin  [dr]  = 

>  9r  =  —do 


sin  [d0 


Note  that  these  laws  were  derived  without  having  to  consider  the  vector  nature  of 
the  electric  and  magnetic  fields,  but  rather  just  the  spatial  frequencies  of  the  waves 
at  the  boundaries.  The  next  task  is  not  quite  this  simple . 


8.1.2  Boundary  Conditions  for  Electric  and  Magnetic  Fields 


We’ve  determined  the  angles  of  the  reflected  and  transmitted  (refracted)  plane  waves 
in  the  form  of  Snell’s  law(s).  We  also  need  to  evaluate  the  “quantity”  of  light  reflected 
and  refracted  due  to  the  boundary.  Since  the  geometries  of  the  fields  will  depend  on 
the  directions  of  the  electric  field  vectors,  we  will  have  to  consider  this  aspect  in 
the  derivations.  In  short,  this  discussion  will  depend  on  the  “polarization”  of  the 
electric  field  (different  from  the  “polarizability”  of  the  medium).  We  will  again  have 
to  match  appropriate  boundary  conditions  at  the  boundary,  but  these  conditions 
apply  to  the  vector  components  of  the  electric  and  magnetic  fields  on  each  side  of  the 
boundary.  We  use  the  same  notation  as  before  for  amplitudes  of  the  electric  fields  of 
the  incident,  reflected,  and  transmitted  (refracted)  waves.  Faraday’s  and  Ampere’s 
laws  (the  Maxwell  equations  involving  curl)  for  plane  waves  can  be  recast  into  forms 
that  are  more  useful  for  the  current  task: 


„  „  OB 

VxE  oc  — — 
at 

„  n  <9E 

VxB  oc  H — — 
at 


We  need  the  constants  of  proportionality  in  this  derivation.  Recall  that  they  depend 
on  the  system  of  units.  We  will  use  the  MKS  system  here: 


VxE 

VxB 


dB 

~~dt 

<9E 

+qi~m 
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where  e  and  /j  are  the  permittivity  and  permeability  of  the  medium,  respectively  and 
the  phase  velocity  of  light  in  the  medium  is: 


The  incident  field  is  assumed  to  be  a  plane  wave  of  the  form  already  mentioned: 


E incident  [®,  V,  z ,  *]  =  Eq  exp  [+i  (k0  •  r  -  U0t )] 


=  E0  exp  |+i  (Jk0]x  x+[k0\yy+  [k0]z  z  -  u0t^ 

=  E0  exp  [+i  ( k0xx  +  k0yy  +  k0zz  -  uj0t )] 

=  (xE0x  +  y E0y  +  z E0z)  exp  [+i  ( k0xx  +  k0yy  +  k0zz  -  uj0t )] 


We  know  that  E0_Lk0.  In  our  coordinate  system,  the  incident  wave  vector  lies  in  the 
x  —  z  plane  (the  plane  defined  by  k0  and  n),  so  that  koy  =  0: 


E incident  lX >  Vi  *]  =  E0  exp  [+i  (ISq  •  T  -  U0t)] 

=  (xf?0l  +  y_E0y  +  zE0z)  exp  [+i  (. k0xx  +  k0zz  -  uj0t)} 


The  boundary  conditions  that  must  be  satisfied  by  the  electric  fields  and 
magnetic  fields  at  the  boundary  are  perhaps  not  obvious.  Consider  the  figure 
left: 


by  the 
on  the 


The  boundary  conditions  on  the  electric  and  magnetic  fields  at  the  boundary  are 

established  from  these  situations. 


We  assume  that  there  is  no  charge  or  current  on  the  surface  and  within  the  cylinder 
that  straddles  the  boundary.  If  the  height  of  the  cylinder  is  decreased  towards  zero, 
then  Gauss’  laws  establish  that  the  flux  of  the  electric  and  magnetic  fields  through 
the  top  and  bottom  of  the  cylinder  (the  z  components  in  this  geometry)  must  cancel: 


ciE-l  •  n  —  62 E2  •  n  =  0 


ei  Elz  —  e2E2z 


B  ,  •  n  —  B0  •  n  =  0 


Biz  —  B 


1z 
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The  flux  of  the  electric  field  in  a  medium  is  the  so-called  “displacement”  field  D  =  eE 
and  the  flux  of  the  magnetic  field  is  the  field  B  Thus  Gauss’  law  determines  that 
the  normal  components  of  D  and  of  B  are  continuous  across  the  boundary  of  the 
medium. 

The  figure  on  the  right  is  a  rectangular  path  (a  “loop”)  that  also  straddles  the 
boundary.  The  unit  vector  t_Ln  points  along  the  surface.  If  the  “height”  of  the  loop 
dh  — >  0,  then  the  circulations  of  the  electric  and  magnetic  fields  must  cancel: 

Ej  •  t  —  E2  •  t  =  0 

''  E\x  =  E*2x 


B,  ~ 

—  •t 


B9  ~ 

—  »t  =  0 

AT 


B 


lx 


AT 


B2x 

H‘2 


We  now  want  to  solve  Maxwell’s  equations  for  an  incident  plane  wave,  which  will 
depend  on  the  incident  angle  6q  and  on  the  vector  direction  of  the  electric  field.  It  is 
convenient  to  evaluate  these  conditions  in  two  cases  of  linearly  polarized  waves:  (1) 
where  the  polarization  is  perpendicular  to  the  plane  of  incidence  defined  by  n  and  k0 
(the  so-called  “s”  polarization  or  transverse  electric  (TE)  waves),  which  also  means 
that  the  electric  field  vector  is  “parallel”  to  the  interface,  and  (2)  the  polarization  is 
parallel  to  the  plane  of  incidence  defined  by  n  and  k0  (the  so-called  “p”  polarization 
or  transverse  magnetic  (TM)  waves).  The  two  cases  are  depicted  below: 


Z 


The  electric  field  perpendicular  to  the  plane  of  incidence;  this  is  the  TRANSVERSE 
ELECTRIC  field  (TE,  also  called  the  “s”  polarization). 
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Z 


The  electric  field  is  parallel  to  the  plane  of  incidence;  this  is  the  TRANSVERSE 
MAGNETIC  field  (TM,  also  called  the  “p”  polarization). 


8.1.3  Transverse  Electric  Waves,  s  Polarization 

In  the  TE  case  in  our  geometry,  the  electric  field  is  oriented  along  the  y  direction  and 
the  wavevector  has  components  in  the  x  and  z  directions: 

E incident  [x,y,z,t]  =  (x  •  0  +  y  •  |E0 1  +  z  •  0)  exp  [+i  ( k0xX  +  k0zz  -  UJ0t )] 

=  y_E0  exp  [+i  (. k0xx  +  k0zz  -  u0t )] 

The  magnetic  field  is  derived  from  the  relation: 

77 

B  =  — knxE 

c 


— incident  Ti  Hi 


COS  [do]  '  Hi 


|En 


x  +  Oy 


+  sin  [d0]  •  Hi 


I  En 


•  exp  [+i  ( k0xx  +  k0zz  -  cu0t)] 

The  reflected  fields  are: 

E reflected  [ x ,  V,  ZA  =  Y  -  |E0|  exp  [+*  (. krxx  +  krzz  -  UQt )] 


— reflected  T)  Vi  t] 


+  COS  [—do]  •  Hi 


I  En 


•  sin  [—do]  •  Hi 


I  En 


exp  [+i  ( k0xx  +  k0zz  -  uj0t )] 


•  cos  [do]  •  Hi 


E0| 


sin  [d0]  •  ni 


E0| 


•  exp  [+i  ( k0xx  +  k0zz  -  cu0t)] 
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and  the  transmitted  (refracted)  fields  are: 

E transmitted  V,  Z,t\  =  y  •  jE,  |  exp  [+*  (ktxX  +  ktzZ  ~  0 )0t)] 


T) 

—transmitted 


X,  y,  Z,  t] 


■  exp  [+i  ( k0xx  +  k0zz  -  uj0t )] 


The  only  components  of  the  electric  field  at  the  interface  are  transverse,  so  the  only 
boundary  conditions  to  be  satisfied  are  the  tangential  electric  field: 


E0  +  Er 


Et 


Er 

1  +  ~ES 


Et_ 

Eq 


This  is  typically  expressed  in  terms  of  the  reflection  and  transmission  coefficients  for 
the  amplitude  of  the  waves  (not  the  power  of  the  waves;  these  are  the  reflectance  R 
and  transmittance  T  of  the  interface,  which  will  be  considered  very  soon): 


Er 

tte  =  yr 
-C/o 

Et 

fTE  =  yr 

-C/0 

where  the  subscripts  denote  the  transverse  electric  polarization.  The  boundary  con¬ 
dition  for  the  normal  magnetic  field  yields  the  expression: 

sin  [d0]  (E0  +  Er)  =  ^  sin  [9t\  Et 

while  that  for  the  tangential  magnetic  field: 

—  cos  [do]  (Eo  -  Er)  =  —  cos  [9t]  Et 
Pi  c  p2c 

These  may  be  solved  simultaneously  for  r  and  t  to  yield  expressions  in  terms  of  the 
indices,  permeabilities,  and  angles:: 


Reflectance  Coefficient  for  TE  Waves 

_El_  ft  cos  [go]  ~  g  cos  \et\ 

E0  2L  cos  [do]  +  g  cos  [9t\ 


T TE 


n\  cos[0o]~n2  f'os[(9, ] 
n\  cos[0q]+«2  cos[0t] 


if  Pi 


p2  (usual  case) 
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Transmission  Coefficient  for  TE  Waves 

Et  +2^cos[d0] 

TE  ~  E 0  “  ^  cos  [0O]  +  ^  COS  [Ot] 

u  Ml  UJ  M2  rJ 


tTE 


+2ni  cos[#o] 
n\  cos  [0q  1+^2  cos[0t] 


if  hi 


IE 


Again,  these  are  the  amplitude  coefficients;  the  reflectance  and  transmittance  of  light 
at  the  surface  relate  the  energies  or  powers.  These  measure  the  ratios  of  the  reflected 
or  transmitted  power  to  the  incident  power.  The  power  is  proportional  to  the  product 
of  the  the  magnitude  of  the  Poynting  vector  and  the  area  of  the  beam.  The  areas  of 
the  beams  before  and  after  reflection  are  identical,  which  means  that  the  reflectance 
is  just  the  ratio  of  the  magnitudes  of  the  Poynting  vectors.  This  reduces  to  the  square 
of  the  amplitude  reflection  coefficient: 


R  =  r2 


which  reduces  to  this  expression  for  the  TE  case: 

_  /  wi  cos  [go]  -  n 2  cos  [9t\  \ 2 
\Ui  COS  [d0]  +  1^2  cos  [9t\  ) 

The  transmission  T  is  a  bit  more  complicated  to  compute,  because  the  refraction 
at  the  interface  changes  the  “width”  of  the  beam  in  one  direction  (along  the  x-axis  in 
this  example),  so  that  the  area  of  the  transmitted  beam  is  different  from  that  of  the 
incident  beam.  This  is  illustrated  in  the  figure  for  a  case  with  n\  >  np- 


z 

Demonstration  that  the  areas  of  the  beams  differ  in  the  two  media.  This  must  be 
accounted  for  in  the  calculation  of  the  power  transmission  T. 


The  magnitude  of  the  Poynting  vector  is  proportional  to  the  product  of  the  index  of 
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refraction  and  the  squared  magnitude  of  the  electric  field: 


The  ratio  of  the  powers  is: 


T  = 


|sx|  oc  ni  |E0| 
|s2|  oc  n2  \Et\ 


s2  A2  n2  Ef  A2  n2 


An 


=  —  •  r  ~ 


§.1 1  Ai  u\\Eq\  A\  n>i  Ai 


The  area  of  the  transmitted  beam  changes  in  proportion  to  the  dimension  along  the 
x-axis  in  this  case,  which  allows  us  to  see  that: 

A2  _  iv2  _  sin  [|  —  6t]  _  cos  [6t\ 


A1  w1  sin  [|  -  d0]  cos  [0o] 

which  leads  to  the  final  expression  for  the  transmission  at  the  interface: 


-  0 
n i 

^cos 

[do] 

rp  _  /  n  2  COS 

V  ni  cos| 

M) 

\-t2 

Snell’s  law  gives  a  relationship  between  the  incident  and  transmitted  angles: 


n\  sin  [d0]  =  n2  sin  [6t 


sin  [9t]  =  —  sin  [d0] 
n2 


cos  |dd  =  a/1  —  sin2 


=  \  1- 


ni 

n2 


sin  [d0 


Thus  we  can  write  down  the  transmittance  T  in  terms  of  the  refractive  indices  and 
the  incident  angle: 

V’ 


T  = 


'  n;  —  n2  sin2 


dr 


ni  cos  [do 


For  the  TE  case,  the  transmission  is: 
Tte  = 


1  \Jn\  —  n{  sin2 

[do] 

|-| 

(  +2ni  cos  [do]  \ 

\  ni  cos  [d0]  j 

cos  [d0]  +  n2  cos  [dt] ) 

These  will  be  plotted  for  some  specific  cases  after  we  evaluate  the  coefficients  for  TM 


waves. 
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8.1.4  Transverse  Magnetic  Waves  (p  polarization) 


In  the  TM  case  in  our  geometry,  the  electric  field  is  in  the  x-z  plane  and  the  wavevector 
has  components  in  the  x  and  z  directions: 

E incident  [x,  V, z ,  *]  =  (*  •  |E0|  cos  [d0]  +  9  ■  0  +  z  •  |E0 1  sin  [-do])  exp  [+i  ( faxx  +  k0zz  - 
=  (x  •  |E0|  cos  [do]  -  Z  •  |E0|  sin  [d0])  exp  [+i  ( k0xx  +  k0zz  -  u0t )] 

The  magnetic  field  is  in  the  y-direction: 

— incident  [x,  y,  z,  t]  =  exp  [+i  (faxx  +  fazz  -  uj0t )] 

The  reflected  fields  are: 

E reflected  [x,  V,  -M]  =  (x  •  -  |E0|  cos  [d0]  -  z  •  |E„|  sin  [d0])  exp  [+i  ( faxx  +  k0zz  -  u0t)] 
B reflected  [■ x ,  V,  z ,  A  =  exP  [+*  (k0xx  +  fa zZ  ~  U0t)} 

and  the  transmitted  (refracted)  fields  are: 

E transmitted  [■ X ,  V,  Z ,  *]  =  (X  '  Hoi  COS  [6t]  -  Z  ■  |Eq|  sill  [6t])  exp  [+*  ( faxX  +  fazZ  -  0 )0t)] 
B transmitted  iXi  Vi  Zi  f  exp  [+i  (ktxX  +  ktzZ  -  U0t)] 

In  the  case,  the  boundary  condition  on  the  normal  component  of  B  is  trivial,  but  the 
other  components  are: 

pi  sin  [d0]  (E0  +  Er)  =  n2  sin  [d2]  Et 
cos  [d0]  (E0  -  Er)  =  cos  [d2]  Et 

—  (Eq  +  Er)  =  ?*-Et 

Hi c  Hi c 

These  are  solved  for  the  reflection  and  transmission  coefficients: 


Transverse  Magnetic  Waves 

Tt  ft/T  —  - 

+gcoS[9„]  +  ^coS[9t] 

which  simplifies  if  the  permeabilities  are  equal  (as  they  usually  are): 


+  U2  COS 

9o_ 

—n\  cos 

et 

TM  +R2  COS 

Po_ 

-\-n\  cos 

9t 

if  Hi  =  P2 
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The  corresponding  reflectance  is: 


Rtm  — 


+n2  cos  [0O]  —  n i  cos  [Of 


_  +n2  cos  [0O]  +  ni  cos  [0t\ 
The  amplitude  transmission  coefficient  evaluates  to: 


trM  — 


2  21  cos  [60 

Ml 


+acos[0„]  +  acos[0«] 
again,  if  the  permeabilities  are  equal,  this  simplifies  to: 

if  hi  =  h 2 


trM  — 


2ni  cos[0q] 


+712  COS[0()]+™1  COS[0£] 


The  corresponding  transmittance  functionis: 
Ttm  = 


(  \Jn\  —  nf  sin2 

[0O] 

|-| 

(  2n\  cos  [0O]  ^ 

l  Tli  cos  [9q\  1 

^  +n2  cos  [90]  +  ni  cos  [Of  ) 

8.1.5  Comparison  of  Coefficients  for  TE  and  TM  Waves 

We  should  compare  the  coefficients  for  the  two  cases  of  TE  and  TM  waves.  The 
reflectance  coefficients  are: 


tte 

ttm 


n\  cos  [$o]  —  Ti’2  cos  [Of 
n\  cos  [e0]  +  n2  cos  [9t\ 
+n2  cos  [0O]  —  ni  cos  [9t\ 
+n2  cos  [0q]  +  ni  cos  [9t] 


where  the  angles  are  also  determined  by  Snell’s  law: 


ni  sin  [0O]  =  ti2  sin  [9t] 


Note  that  angles  and  the  indices  for  the  TE  case  are  “in”  the  same  media,  i.e.,  the 
index  ni  multiplies  the  cosine  of  9q,  which  is  in  the  same  medium.  The  same  condition 
holdes  for  n2  and  9t.  The  opposite  is  true  for  the  TM  case:  n i  is  applied  to  cos  [Of  and 
n2  to  cos  [Off .  These  same  observations  also  apply  to  the  corresponding  transmission 
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coefficients: 


tTE 

trM 


+2ni  cos  [d0 
ni  cos  [d0]  +  n2  cos  [9t] 
+2rii  cos  [d0] 

+n2  cos  [d0]  +  n i  cos  [9t] 


Normal  Incidence  (d0  =  0) 


In  the  case  of  normal  incidence  where  6q  =  6r  =  6t  =  0.  then  the  TE  and  TM 
equations  evaluate  to: 


rTE\e0=o 
rTM  |0O=O 
fjTE  |0O=O 
tTM  |0O=O 


ni  -  n2 

rii  +  n2 
+n2  -  n  i 

+n2  +  ni 

+2ni 


( rTE\o0=0 ) 


n\  +  n2 

+2ni 

n  i  +  n2 


^e|0o=o 


cases  are  identical.  Also,  the  areas  of  the  incident  and  transmitted  waves  are  identical 
so  there  is  no  area  factor  in  the  transmittance.  The  resulting  formulas  for  reflectance 
and  transmittance  reduce  to: 


normal  incidence  (do  =  0) 


Rte  (do  —  0)  —  Rtm  (do  —  0)  = 


4nin2 

(' U1+U2 )'2 


R= 

n  i  +n2 


Example:  Rare-to-Dense  Reflection  If  the  input  medium  has  a  smaller  refrac¬ 
tive  index  n  (a  rarer  medium)  than  the  second  ( denser )  medium,  so  that  n\  <  n2, 
then  the  coefficients  are: 
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n,\  =  1.0 
n2  =  1.5 


1.0 -1.5 


TtE  — 


1.0- 

1.5 


1.5 

-  1.0 


=  -0.2 


TtM  = 


1.0+  1.5 


tTE  —  trM  — 


=  +0.2 

2  •  1.0 
1.0  +  1.5 
=t-  Rte  =  Rtm  =  0.04 
==>•  Tte  =  Ttm  =  0.96 
for  “rare-to-dense”  reflection 


0.2e+i7r 


+0.8 


In  words,  the  phase  of  the  reflected  light  is  changed  by  7r  radians  =  180°  if  reflected 
at  a  “rare-to-dense”  interface  such  as  the  usual  air-to-glass  case. 


Example:  Dense-to- Rare  Reflection  If  the  input  medium  is  “denser”  (ji\  >  n2), 
then  these  values  are  obtained: 


n,\  =  1.5 
n2  =  1.0 


1.5  -  1.0 

ete  —  ^  ^  —  +0.2 


1.5- 

1.0 


1.0 
-  1.5 


t+m  — 


=  -0.2  =  0.2e 


+Z7T 


1.0  +  1.5 
Rte  =  Rtm  =  0.04 
=t-  Tte  =  Ttm  =  0.96 

There  is  no  phase  shift  of  the  reflected  amplitude  in  “dense-to-rare”  reflection,  com¬ 
monly  called  “internal”  reflection.. 


8.1.6  Angular  Dependence  of  Reflection  and  Transmittance 
at  “Rare-to-Dense”  Interface 

Consider  the  graphs  of  these  coefficients  for  the  cases  of  the  “rare-to-dense”  interface 
(ni  =  1  <  n2  =  1.5).  The  reflection  coefficients  are  plotted  vs.  incident  angle 
measured  in  degrees  from  0°  (normal  incidence)  to  90°  (grazing  incidence). 
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0q  (degrees) 

Amplitude  reflectance  and  transmittance  coefficients  for  n\  =  1.0  (air)  and  n2  =  1.5 
( glass )  for  both  TE  and  TM  waves,  plotted  as  functions  of  the  incident  angle  from 
9 o  =  0°  (normal  incidence)  to  90  =  90°  (grazing  incidence).  The  reflectance 
coefficient  rTE  <  0  for  all  9,  which  means  that  there  is  a  phase  shift  upon  reflection, 
whereas  rEM  >  0  for  9q  <  9b  (Brewster’s  angle).  Also  note  that  the  transmittance 

coefficients  are  very  similar  functions. 


Brewster’s  Angle  —  Angle  of  Complete  Polarization 


Note  that  rEM  =  0  at  one  particular  angle  (=  60°)  in  the  TM  case  (parallel  polar¬ 
ization),  which  means  that  no  amplitude  of  this  wave  is  reflected  if  incident  at  this 
angle.  In  other  words,  any  light  reflected  at  this  angle  must  be  the  TE  wave  which 
is  completely  polarized  perpendicular  to  the  plane  of  incidence.  This  is  Brewster’s 
angle,  the  angle  such  that  the  reflected  wave  and  the  refracted  wave  are  orthogonal 
(i.e.,  90  +  9t  =  |  =>•  9t  =  |  —  90).  In  this  case,  the  electrons  driven  in  the  plane  of  the 
incidence  will  not  emit  radiation  at  the  angle  required  by  the  law  of  reflection.  This  is 
sometimes  called  the  angle  of  complete  polarization.  Note  that  the  transmitted  light 
contains  both  polarizations,  though  not  in  equal  amounts. 
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Polarization  of  reflected  light  at  Brewster’s  angle.  The  incident  beam  at  90  =  0B  is 
unpolarized.  The  reflectance  coefficient  for  light  polarized  in  the  plane  (TM  waves) 
is  0 ,  and  the  sum  of  the  incident  and  refracted  angle  is  90°  =  f .  Thus 

eB  +  ot  =  i=>dt  =  i-dB. 

From  Snell’s  law,  we  have: 

n\  sin  [0i]  =  7*2  sin  [92\ 

At  Brewster’s  angle, 


ni  sin  [0B]  =  7*2  sin 


V2  ~  0B 


=  n2  sin 


7T 


cos  [6 


L2 

=  +772  COS  [$B] 

n i  sin  [0B]  =  772  cos  [ 6B ] 

772  _  sin  [0B\ 

77.i  COS[0B 


B 


COS 


7 r 

L2J 


sin[0B]) 


=  tan  [9B 


6b  =  tan  1 


772 

77,1 


If  77 !  =  1  (air)  and  n2  =  1.5  (glass),  then  9B  —  56.3°.  For  incident  angles  larger  than 
about  56°,  the  reflected  light  is  plane  polarized  parallel  to  the  plane  of  incidence. 
If  the  dense  medium  is  water  (t72  =  1.33),  then  0B  =  52.4°.  This  happens  at  the 
interface  with  any  dielectric.  The  reflection  at  Brewster’s  angle  provides  a  handy 
means  to  determine  the  polarization  axis  of  a  linear  polarizer  -  just  look  through  a 
linear  polarizer  at  light  reflected  at  a  shallow  angle  relative  to  the  surface  (e.g.,  a 
waxed  floor). 


Reflectance  and  Transmittance  at  “Rare-to-Dense”  Interface 

The  reflectance  and  transmittance  the  two  polarizations  with  77!  =  1.0  and  n2  =  1.5 
as  functions  of  the  incident  angle  9q  show  the  zero  reflectance  of  the  TM  wave  at 
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Brewster’s  angle. 


9o  (degrees) 

Reflectance  and  transmittance  for  n\  =  1.0  and  n2  =  1.5  for  TE  and  TM  waves. 
Note  that  Rtm  =  0  and  Ttm  =  1  at  one  angle. 


8.1.7  Reflection  and  Transmittance  at  “Dense-to-Rare”  In¬ 
terface,  Critical  Angle 

At  a  “glass-to-air”  interface  where  n\  >  n2,  the  reflectance  of  the  TM  wave  (s  polar¬ 
ization)  is: 

— n2  cos  [d0]  +  n  1  cos  [9t\ 

+n2  cos  [d0]  +  rii  cos  [9t\ 

The  numerator  evaluates  to  zero  for  a  particular  incident  angle  that  satisfies: 


n2  cos  [do]  =  n\  cos  [9t] 
ni  cos  [d0] 

n2  cos  [9t\ 

This  corresponds  to  the  situation  where  Snell’s  law  requires  that: 


sin 


1 


—  sin  [d0]  =>■  sin  [d0] 
n2 


712 

ni 


If  n-i  =  1.5  and  n2  =  1.0,  then 


sin  [d0]  =  -  =>•  d0  =  0.73  radians  =  41.8°  =  dc 
o 

If  the  incident  angle  exceeds  this  value  dc,  the  critical  angle,  then  the  amplitude 
reflectance  coefficients  rxE  and  7'tm  are  both  unity,  and  thus  so  are  the  reflectances 
Rte  and  Rtm ■  This  means  that  light  incident  for  do  >  dc  is  totally  reflected.  This  is 
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the  source  of  total  internal  reflectance  ( “internal”  because  the  reflection  is  from  glass 
back  into  glass).  The  phenomenon  of  TIR  is  the  reason  for  the  usefulness  of  optical 
fibers  in  communications. 

The  angular  dependences  of  the  amplitude  reflection  coefficients  for  the  case  n\  = 
1.5  (glass)  and  n2  =  1.0  (air)  are  shown.  Brewster’s  angle  in  this  case  satisfies: 


9b  =  tan  1 


n2 
n  i 


tan  1 


1 

L5 


=  33.7° 


Amplitude  reflectance  coefficients  for  TE  and  TM  waves  if  ri\  =  1.5  (glass)  and 
n2  =  1.0  (air).  Both  coefficients  rise  to  r  =  +1.0  at  the  “critical  angle”  9C,  for 
which  9t  =  90°  =  |.  Also  noted  is  Brewster’s  angle,  where  vtm  =  0.  The  situation 
for  90  >  9C  can  be  interpreted  as  producing  complex-valued  vte  and  vtm- 

8.1.8  Practical  Applications  for  Fresnel’s  Equations 

The  4 %  normal  reflectance  of  one  surface  of  glass  is  the  reason  why  windows  look 
like  mirrors  at  night  when  you’re  in  the  brightly  lit  room.  Lasers  incorporate  end 
windows  oriented  at  Brewster’s  angle  to  eliminate  reflective  losses  at  the  mirrors  (and 
also  thus  producing  polarized  laser  light).  Optical  fibers  use  total  internal  reflection. 
Hollow  fibers  use  high-incidence-angle  near-unity  reflections. 


8.2  Index  of  Refraction  of  Glass 

We  have  already  stated  that  the  index  of  refraction  n  relates  the  phase  velocity  of 
light  in  vacuum  with  that  in  matter: 

c 

n  =  —  >1. 

v4> 
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In  a  dispersive  medium,  the  index  n  decreases  with  increasing  A,  which  ensures  that 
the  phase  velocity  ^  (of  the  average  wave)  is  larger  than  the  group  velocity  (of  the 
modulation  wave). 

Refraction  is  the  result  of  the  interaction  of  light  with  atoms  in  the  medium  and 
depends  on  wavelength  because  the  refractive  index  is  also;  recall  that  the  index 
decreases  with  increasing  wavelength: 


Typical  dispersion  curve  for  glass  showing  the  decrease  in  n  with  increasing  A  and 
the  three  spectral  wavelengths  used  to  specify  “refractivity” ,  “ mean  dispersion”,  and 

“partial  dispersion”. 

To  a  first  approximation,  the  index  of  refraction  varies  as  A”1,  which  allows  us  to 
write  an  empirical  expression  for  the  refractivity  of  the  medium  n  —  1: 

b 

n  [A]  -  1  =  a  +  - 
A 

where  a  and  b  are  parameters  determined  from  measurements.  The  observation  that 
the  index  decreases  with  increasing  A  determines  that  b  >  0.  Cauchy  came  up  with 
an  empirical  relation  for  the  refractivity  more  free  parameters: 

n  [A]  -  1  =  A  ^1  +  —  +  —  H  ^ 

Again,  the  behavior  of  normal  dispersion  ensures  that  A  and  B  are  both  positive.  Yet 
a  better  formula  was  proposed  by  Hartmann: 

n[A]^+(A_“o)L2 

where  a  >  0.  The  refractive  properties  of  the  glass  are  approximately  specified  by  the 
refractivity  and  the  measured  differences  in  refractive  index  at  the  three  Fraunhofer 
wavelengths  F,  D,  and  C: 
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Refractivity 

nD  ~  1 

1.75  <  nD  <  1.5 

Mean  Dispersion 

uf  —  nc  >  0 

differences  between  blue  and  red  indices 

Partial  Dispersion 

rio  —  nc  >  0 

differences  between  yellow  and  red  indices 

Abbe  Number 

v  —  n°~1 

riF—nc 

ratio  of  refractivity  and  mean  dispersion,  25  <  v  <  65 

Glasses  are  specified  by  six-digit  numbers  abcdef  where  n o  =  1 . abc,  to  three 
decimal  places,  and  v  =  de.f .  Note  that  larger  values  of  the  refractivity  mean  that 
the  refractive  index  is  larger  and  thus  so  is  the  deviation  angle  in  Snell’s  law.  A 
larger  Abbe  number  means  that  the  mean  dispersion  is  smaller  and  thus  there  will  be 
a  smaller  difference  in  the  angles  of  refraction.  Such  glasses  with  larger  Abbe  numbers 
and  smaller  indices  and  less  dispersion  are  crown  glasses,  while  glasses  with  smaller 
Abbe  numbers  are  flint  glasses,  which  are  “denser”.  Examples  of  glass  specifications 
include  Borosilicate  crown  glass  (BSC),  which  has  a  specification  number  of  517645, 
so  its  refractive  index  in  the  D  line  is  1.517  and  its  Abbe  number  is  v  =  64.5.  The 
specification  number  for  a  common  flint  glass  is  619364,  so  no  =  1.619  (relatively 
large)  and  v  =  36.4  (smallish).  Now  consider  the  refractive  indices  in  the  three  lines 
for  two  different  glasses:  “crown”  (with  a  smaller  n)  and  “flint:” 


Line 

A  [nm] 

n  for  Crown 

n  for  Flint 

C 

656.28 

1.51418 

1.69427 

D 

589.59 

1.51666 

1.70100 

F 

486.13 

1.52225 

1.71748 

The  glass  specification  numbers  for  the  two  glasses  are  evaluated  to  be: 


For  the  crown  glass: 

refractivity:  nD  —  1  =  0.51666  =  0.517 
1.51666  -  1 


Abbe  number:  v  = 


1.52225  -  1.51418 


“  64.0 


Glass  number  =517640 


For  the  flint  glass : 


refractivity  :L  nD  —  1  =  0.70100  =  0.701 
0.70100  -  1 


Abbe  number:  v  = 


1.71748  -  1.69427 


=  30.2 


Glass  number  =701302 
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8.2.1  Optical  Path  Length 

Because  the  phase  velocity  of  light  in  a  medium  is  less  than  that-  in  vacuum,  light 
takes  longer  to  travel  through  a  given  thickness  of  material  than  through  the  same 
“thickness”  of  vacuum.  For  a  fixed  distance  d.  we  know  that: 

cl  =  v  ■  t  ( distance  =  velocity  x  time) 

=  c-t\  (in  vacuum ) 

Q 

=  —  ■  t2  (in  medium  of  index  n) 
n 

^2 

=>■  tx  =  —  =>  t2  >  ti 
n 

In  the  time  t2  required  for  light  to  travel  the  distance  d  in  a  material  of  index  n. 
light  would  travel  a  longer  distance  nd  =  ct2  in  vacuum.  The  distance  nd  traveled  in 
vacuum  in  the  equivalent  time  is  the  optical  path  length  in  the  medium. 


8.3  Polarization 

Maxwell’s  equations  demonstrated  that  light  is  a  transverse  wave  (as  opposed  to 
longitudinal  waves,  e.g.,  sound).  Both  the  E  and  B  vectors  are  perpendicular  to 
the  direction  of  propagation  of  the  radiation.  Even  before  Maxwell,  Thomas  Young 
inferred  the  transverse  character  of  light  in  1817  when  he  passed  light  through  a  calcite 
crystal  (calcium  carbonate,  CaCO 3).  Two  beams  emerged  from  the  crystal,  which 
Young  brilliantly  deduced  were  orthogonally  polarized,  i.e.,  the  directions  of  the  E 
vectors  of  the  two  beams  are  orthogonal.  The  two  components  of  an  electromagnetic 
wave  are  the  electric  field  E  [^]  and  the  magnetic  field  B  [tesla  =  we^ffs\ . 

The  polarization  of  radiation  is  defined  as  the  plane  of  vibration  of  the  electric 
vector  E,  rather  than  of  B,  because  the  effect  of  the  E  field  on  a  free  charge  (an 
electron)  is  much  greater  than  the  effect  of  B .  This  is  seen  from  the  Lorent-z  equation, 
or  the  Lorentz  force  law: 

F  oc  g0  (E  +  ~x^) 
go  =  charge  [coulombs] 

F  =  force  on  the  charge  [newtons,  1  N  =  PY. m ] 
v  =  velocity  of  the  charge  go,  measured  in  [— ] 
c  =  velocity  of  light-  [3  •  108^] 

The  factor  c  1  ensures  that-  the  force  on  the  electron  due  t-o  the  magnetic  field  is 
usually  much  smaller  than  the  electric  force. 

8.3.1  Plane  Polarization  =  Linear  Polarization 

The  most-  familiar  type  of  polarization  is  linear  polarization,  where  the  E-vector 
oscillates  in  the  same  plane  at-  all  points  on  the  wave. 
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Any  state  of  linear  polarization  can  be  expressed  as  a  linear  combination  (sum) 
of  two  orthogonal  states  ( basis  states ),  e.g.,  the  x-  and  y-components  of  the  E-vector 
for  a  wave  traveling  toward  z  =  ±oo: 


E  =  E  [r,  t]  =  \kEx  +  y Ey\  cos  [kz  —  cut] 
x,  y  =  unit  vectors  along  x  and  y 
Ex,Ey  =  amplitudes  of  the  x-  and  y-components  of  E. 

For  a  wave  of  amplitude  E0  polarized  at  an  angle  9  relative  to  the  x-axis: 


Ex  =  E0  cos  [9\ 

Ey  =  E0  sin  [d] 

Linearly  polarized  radiation  oscillates  in  the  same  plane  at  all  times  and  at  all  points 
in  space.  Especially  note  that  Ex and  F^are  in  phase  for  linearly  polarized  light,  i.e., 
both  components  have  zero-crossings  at  the  same  point  in  time  and  space. 


E 


Electric  field  vector  E  and  magnetic  field  vector  H  of  a  plane-polarized  wave 


8.3.2  Circular  Polarization 

If  the  E-vector  describes  a  helical  (i.e.,  screw-like)  motion  in  space,  the  projection 
of  the  E-vector  onto  a  plane  normal  to  the  propagation  direction  k  exhibits  circular 
motion  over  time,  hence  the  polarization  is  circular: 
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Circular  polarization  occurs  when  the  electric  fields  along  orthogonal  axes  have  the 
same  amplitude  by  their  phases  differ  by  ±|  radians. 

If  we  sit  at  a  fixed  point  in  space  z  =  z0,  the  motion  of  the  E- vector  is  the  sum 
of  two  orthogonal  linearly  polarized  states,  but  with  one  component  out-of-phase  by 
90°  =|  radians.  The  math  is  identical  to  that  used  to  describe  oscillator  motion  as 
the  projection  of  rotary  motion: 


motion  =  x  cos  [cut]  +  y  cos 
For  a  traveling  wave: 


cut  =F 


7 r 


x  cos  [cut]  ±  y  sin  [cut] 


E  —  [ Ex ,  Ey]  - 
=  [E0  cos  [k 


=  E0  cos  [kz  —  cut] ,  E0  cos 
z  —  cut] ,  ±E0  sin  [kz  —  cut]] 


kz  —  cut  =F 


where  the  upper  sign  applies  to  right-handed  circular  polarization  (angular  momen¬ 
tum  convention) 


8.3.3  Nomenclature  for  Circular  Polarization 

Like  linearly  polarized  light,  circularly  polarized  light  has  two  orthogonal  states,  i.e., 
clockwise  and  counterclockwise  rotation  of  the  E-vector.  These  are  termed  right- 
handed  (RHCP)  and  left-handed  (LHCP).  There  are  two  conventions  for  the  nomen¬ 
clature: 


1.  Angular  Momentum  Convention  (my  preference):  Point  the  thumb  of  the 


hand  in  the  direction  of  propagation.  If  the  fingers  point  in  the  direction  of  ro- 
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RHCP 

tat-ion  of  the  E-vector,  then  the  light  is  < 

[  LHCP 

2.  Optics  (also  called  screwy )  Convention:  The  path  traveled  by  the  E- vector  of 
RHCP  light  is  the  same  path  described  by  a  right-hand  screw.  Of  course,  the 
natural  laws  defined  by  Murphy  ensure  that  the  two  conventions  are  opposite: 
RHCP  light  by  the  angular  momentum  convention  is  LHCP  by  the  screw  con¬ 
vention. 


8.3.4  Elliptical  Polarization,  Reflections 

If  the  amplitudes  of  the  .i-and  y-coinponciits  of  the  E-vector  are  not  equal,  or  if  the 
phase  difference  is  not  ±|  =  ±90°,  then  the  projection  of  the  path  of  the  E- vector 
is  not  a  circle,  but  rather  an  ellipse.  This  results  in  elliptical  polarization.  Note  that 
elliptical  polarization  may  be  either  right-  or  left-handed,  as  defined  above. 

8.3.5  Change  of  Handedness  on  Reflection 

By  conservation  of  angular  momentum,  the  direction  of  rotation  of  the  E- vector  does 
not  change  on  reflection.  Since  the  direction  of  propagation  reverses,  the  handedness 
of  the  circular  or  elliptical  polarization  changes: 


Change  in  “handedness  ”  of  a  circularly  polarized  wave  upon  reflection  by  a  mirror. 


Natural  Light 

The  superposition  of  emissions  from  a  large  number  of  thermal  source  elements  (as  in 
a  light  bulb)  has  a  random  orientation  of  polarizations.  The  state  of  polarization  of 
the  resulting  light  changes  direction  randomly  over  very  short  time  intervals  (=  10  8 
seconds).  The  radiation  is  termed  unpolarized ,  even  though  it  is  polarized  when 
viewed  within  this  short  time  period.  Natural  light  is  neither  totally  polarized  nor 
totally  unpolarized;  rather,  we  speak  of  partial  polarization. 
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8.4  Description  of  Polarization  States 

8.4.1  Jones  Vector 

The  components  of  the  electric  field  in  the  two  orthogonal  directions  may  used  to 
represent  a  vector  with  complex  components.  This  is  called  a  Jones  vector ,  which  is 
useful  only  for  completely  polarized  light. 


E  =  Re{E0ei[fe-^]}  =  [Re  {Exei[kz~^]}  ,  Re  {Eyei{kz-ut~5)}\ 
=  Re{[Ex,Eye~i5}ei[kz-UJt]} 

Ex 

==>  Jones  Vector  £  = 

Eye~lS 

Examples: 

1.  Plane-polarized  light  along  cc-axis 

0 


2.  Plane-polarized  light  along  y-axis: 

0 

£  = 

Eq 

3.  Plane-polarized  light  at  angle  0  to  a-axis: 

E0  cos  [9\ 
E0  sin  [6] 

4.  RHCP 


E  =  xE0  cos  [kz  —  cut]  +  y  E0  sin  [kz  —  cut] 

r  7r" 

=  x.E0  cos  [kz  —  cut]  +  y E0  cos  kz  —  cut  —  — 

Zj  _ 


=  Re 


0i[kz— Lit] 


Eq  exp  [-f[ 


£  =  Re  <  Ei 


0i[kz—ujt\ 


exp  [-f  [ 


Other  representations  of  the  state  of  polarization  are  available  (e.g.  ,  Stokes’  para¬ 
meters,  coherency  matrix,  Mueller  matrix,  Poincare  sphere).  They  are  more  compli- 
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cated,  and  hence  more  useful,  i.e.;  they  can  describe  partially  polarized  states.  For 
more  information,  see  (for  example),  Polarized  Light  by  Shurcliff. 


8.5  Generation  of  Polarized  Light 

8.5.1  Selective  Emission: 

If  all  emitting  elements  of  a  source  (e.g.,  electrons  in  a  bulb  filament),  vibrate  in  the 
same  direction,  the  radiated  light  will  be  polarized  in  that  direction.  This  is  difficult 
to  achieve  at  optical  frequencies  (At  A  10'  14  s  =>  u  A  1014  Hz),  but  is  easy  at  radio 
or  microwave  frequencies  {y  <  108  Hz)  by  proper  design  of  the  antenna  that  radiates 
the  energy.  For  example,  a  radio-frequency  oscillator  attached  to  a  simple  antenna 
forces  the  free  electrons  in  the  antenna  to  oscillate  along  the  long  (vertical)  dimension 
of  the  antenna.  The  emitted  radiation  is  therefore  mostly  oscillating  in  the  vertical 
direction;  it  is  vertically  polarized. 


Polarization  of 
A  Emitted  Radiation 


Radio  Frequency  (RF) 
Oscillator  ( v  ■~-1  10B  Hz) 


“Light”  (electromagnetic  radiation)  emitted  by  a  “dipole”  radiator  is  polarized  in  the 
direction  of  motion  of  the  emitting  electrons  (vertical,  in  this  case). 


Rather  than  generating  polarized  light  at  the  source,  we  can  obtain  light  of  a  selected 
polarization  from  natural  light  by  removing  unwanted  states  of  polarization.  This  is 
the  mechanism  used  in  the  next  section. 


8.5.2  Selective  Transmission  or  Absorption 

A  manmade  device  for  selecting  a  state  of  polarization  by  selective  absorption  is 
Polaroid.  This  operates  like  the  microwave-polarizing  skein  of  wires.  The  wires  are 
parallel  to  the  y-axis  in  the  figure.  Radiation  incident  on  the  wires  drives  the  free 
electrons  in  the  wires  in  the  direction  of  polarization  of  the  radiation.  The  electrons 
driven  in  the  y-direction  along  the  surface  of  the  wire  and  strike  other  such  electrons, 
thus  dissipating  the  energy  in  thermal  collisions.  What  energy  that  is  reradiated  by 
such  electrons  is  mostly  directed  back  toward  the  source  (reflected).  The  ^-component 
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of  the  polarization  is  not  so  affected,  since  the  electrons  in  the  wire  are  constrained 
against  movement  in  that  direction.  The  ^-component  of  the  radiation  therefore 
passes  nearly  unaffected. 

Common  polaroid  sheet  acts  as  a  skein  of  wires  for  optical  radiation.  It  is  made 
from  clear  polyvinyl  acetate  which  has  been  stretched  in  one  direction  to  produce  long 
chains  of  hydrocarbon  molecules.  The  sheet  is  then  immersed  in  iodine  to  supply  lots 
of  free  electrons. 


Unpolarized  /f 
LigUt 
- 


Parallel 

Wires 


Random  E*s 


Tr  an  smitte  d. 
Polarization 


Polarization  by  “skein  of  wires”  -  the  radiation  polarized  parallel  to  the  direction  of 
the  wires  in  the  skein  is  absorbed,  so  the  radiation  polarized  perpendicular  to  the 

wires  is  transmitted. 

8.5.3  Generating  Polarized  Light  by  Reflection  —  Brewster’s 
Angle 

H§8.6 

The  two  polarizations  of  light  reflected  from  an  interface  between  two  different 
dielectric  media  (i.e.,  media  with  different  real  refractive  indices)  see  the  same  con¬ 
figuration  of  the  interface  only  with  normal  incidence  (i.e.,  the  light  is  incident  per¬ 
pendicular  to  the  surface).  Thus  the  two  polarizations  must  be  identically  reflected. 
However,  if  the  light  is  incident  obliquely,  one  polarization  “sees”  the  bound  electrons 
of  the  surface  differently  and  therefore  is  reflected  differently.  The  reflected  wave  is 
polarized  to  some  extent;  the  amount  of  polarization  depends  on  the  angle  of  inci¬ 
dence  and  the  index  of  refraction  n.  The  polarization  mechanism  is  simply  pictured 
as  a  forced  electron  oscillator.  The  bound  electrons  in  the  dielectric  material  are 
driven  by  the  incident  oscillating  electric  field  of  the  radiation  Eexp  [i  (/c0£o  ±  cuot)], 
and  hence  vibrate  at  frequency  v o  =  |y-  Due  to  its  acceleration,  the  vibrating  elec¬ 
tron  reradiates  radiation  at  the  same  frequency  v  to  produce  the  reflected  wave.  The 
state  of  polarization  of  the  reflected  radiation  is  a  function  of  the  polarization  state 
of  the  incident  wave,  the  angle  of  incidence,  and  the  indices  of  refraction  on  either 
side  of  the  interface.  If  the  reflected  wave  and  the  refracted  wave  are  orthogonal 
(i.e.,  90  +  9t  =  90°  =>  9t  =  §  —  9 0),  then  the  reflected  wave  is  completely  plane 
polarized  parallel  to  the  surface  (and  thus  polarized  perpendicular  to  the  plane  of 
incidence).  This  angle  appeared  in  the  discussion  of  the  reflectance  coefficients  in  the 
previous  section.  In  this  case,  the  electrons  driven  in  the  plane  of  the  incidence  will 
not  emit  radiation  at  the  angle  required  by  the  law  of  reflection.  This  angle  of  com¬ 
plete  polarization  is  called  Brewster’s  Angle  9b,  which  we  mentioned  earlier  during 
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the  discussion  of  the  Fresnel  equations. 


Brewster’s  angle:  the  incident  beam  at  do  =  9b  is  unpolarized.  The  reflectance 
coefficient  for  light  polarized  in  the  plane  (TM  waves)  is  0,  and  the  sum  of  the 
incident  and  refracted  angle  is  90°  =  |.  Thus  6b  +  9t  =  |  9t  =  |  —  9  b- 


At  Brewster’s  angle, 


rii  sin  [9b]  =  sin 


9  b  =  tan 


ri2 

ni 


If  n i  =  1  (air)  and  n2  =  1.5  (glass),  then  9b  —  56.3°.  For  incident  angles  larger  than 
about  56°,  the  reflected  light  is  plane  polarized  parallel  to  the  plane  of  incidence. 
If  the  dense  medium  is  water  (n2  =  1.33),  then  9B  —  52.4°.  This  happens  at  the 
interface  with  any  dielectric.  The  reflection  at  Brewster’s  angle  provides  a  handy 
means  to  determine  the  polarization  axis  of  a  linear  polarizer  *«*  just  look  through  the 
polarizer  at  light  reflected  at  a  steep  angle. 


8.5.4  Polarization  by  Scattering 

Light  impinging  on  an  air  molecule  drives  the  electrons  of  the  molecule  in  the  direc¬ 
tion  of  vibration  of  the  electric  field  vector.  This  motion  causes  light  to  be  reradiated 
in  a  dipole  pattern;  i.e.,  no  light  is  emitted  along  the  direction  of  electron  vibration. 
If  we  look  at  scattered  light  (e.g.,  blue  sky)  at  90°  from  the  source,  the  light  is  com¬ 
pletely  linearly  polarized.  Note  that  if  the  light  is  multiply  scattered,  as  in  fog,  each 
scattering  disturbs  the  state  of  polarization  and  the  overall  linear  state  is  perturbed 
into  unpolarized  radiation. 
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Scattering  of  sunlight  by  atmospheric  molecules. 


8.6  Birefringence  —  Double  Refraction 


H§8.4 

Many  natural  crystals  and  manmade  materials  interact  with  the  two  orthogonal 
polarizations  differently.  This  is  often  due  to  an  anistropy  (nonuniformity)  in  the 
crystalline  structure;  such  materials  are  called  dichroic  or  birefringent  Many  crystals 
(e.g.,  calcite)  divide  a  nonpolarized  light  wave  into  two  components  with  orthogonal 
polarizations.  The  two  indices  of  refraction  are  sometimes  denoted  n f  and  ns  for  fast 
and  slow  axes,  where  rif  <  ns.  They  are  also  denoted  n0  and  ne  for  ordinary  and 
extraordinary  axes.  The  ordinary  ray  obeys  Snell’s  law;  the  extraordinary  ray  does 
not. One  is  called  the  ordinary  ray ,  because  it  obeys  Snell’s  law  for  refraction.  The 
second,  or  extraordinary  ray ,  does  not  obey  Snell.  By  dividing  the  incoming  natural 
light  into  two  beams  in  such  a  crystal,  we  can  select  one  of  the  two  polarizations. 

8.6.1  Examples: 

Refractive  indices  along  the  fast  and  slow  axes  at  A  =  589.3  nm 


Material 

ns 

nf 

Calcite  (CaCOf) 

1.6584 

1.4864 

Crystalline  Quartz  (Si02) 

1.5534 

1.5443 

Ice  (crystalline  H20) 

1.313 

1.309 

Rutile  ( Ti02 ) 

2.903 

2.616 

Sodium  Nitrate  (SiNO^) 

1.5854 

1.3369 

8.6  BIREFRINGENCE  -  DOUBLE  REFRACTION 
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The  wavelength  of  light  in  a  medium  is  A'  =  so  light  along  the  two  polarization 
directions  have  different  wavelengths: 


a;  =  —  <  \'f  =  — 

ns  1  nf 


8.6.2  Phase  Delays  in  Birefringent  Materials  —  Wave  Plates 


Consider  light  incident  on  a  birefringent  material  of  thickness  cl.  The  electric  field  as 
a  function  of  distance  z  and  time  t  is: 

EM  =  (±Ex+yEy) 

At  the  input  face  of  the  material  (z  =  0)  and  the  output  face  (z  =  cl),  the  fields  are: 

E[z  =  0,t]  =  (ZEx  +  yEy)  e~iut 
E  [z  =  cl,  t]  =  (-xEx  +  y Ey)  ei{kd~ut) 


If  nx  =  ns  >  ny  =  rif„  then  A/  >  As  and  : 


,  ,  2n  ns 

ks  kx  ^  >  k  j  ky 

The  field  at  the  output  face  (z  =  cl)  is  therefore: 


27 TUf 

A 


E  [cl,  t]  = 


xl4  exp 


i  (2nd  ■  ns 


A 


y  Ey  exp 


=  xKI:  +  y Ey  exp 


- yd  (nf  -  ns) 


+i 


exp 


.  2nd  ■  n,f 


A 


—iu)t 


+i- 


2ndrii 


A 


By  defining  a  constant  phase  term  S  =  ^fd  ( rif  —  ns),  the  electric  field  at  the  output 
face  of  the  birefringent  material  can  be  expressed  as: 


E  [cl,  t]  =  (xEx  +  y EyelS)  exp 


+i- 


2ndni 


A 


On  emergence  from  the  material,  the  ^/-component  of  the  polarization  has  a  different 
phase  than  the  ^-component;  the  phase  difference  is  5. 


Example: 

S  =  +|=>  (nf  —  ns)d  =  and  there  is  a  phase  difference  of  one  quarter  wave¬ 
length  between  the  polarizations  of  the  x-  and  the  ^/-components  of  the  wave.  This 
is  a  quarter-wave  plate.  The  required  thickness  d  of  the  material  is: 


4  (ns  -  nf) 
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And  the  emerging  field  is: 


E  [cl,  t] 


/*.  7— t  ^  _i_  17 r 

xEx  +  y  Eye+  2 


exp  [ i  ( ksd  —  cut)] 


If  Ex  =  Ey ,  (i.e.,  the  incident  wave  is  linearly  polarized  @  45°  to  the  x-axis),  then  the 
emerging  wave  is  circularly  polarized.  This  is  the  principle  of  the  circular  polarizer. 


Example: 

If  5  =  +7T  ==>  d  =  2^nX_n  \ ,  and  the  relative  phase  delay  is  180°.  Such  a  device 

is  a  half-wave  plate.  If  the  incident  light  is  linearly  polarized  along  the  orientation 
midway  between  the  fast  and  slow  axes,  the  plane  of  polarization  of  the  exiting  linearly 
polarized  light  is  rotated  by  90°. 


8.6.3  Circular  Polarizer: 

A  circular  polarizer  is  a  sandwich  of  a  linear  polarizer  and  a  |  plate,  where  the 
polarizing  axis  is  oriented  midway  between  the  fast  and  slow  axes  of  the  quarter-wave 
plate.  The  LP  ensures  that  equal  amplitudes  exist  along  both  axes  of  the  quarter- 
wave  plate,  which  delays  one  of  the  components  to  create  circularly  polarized  light. 
Light  incident  from  the  back  side  of  a  circular  polarizer  is  not  circularly  polarized 
on  exit;  rather  it  is  linearly  polarized.  A  circular  polarizer  can  be  recognized  and 
properly  oriented  by  placing  it  on  a  reflecting  object  (e.g.,.a  dime).  If  the  image  of 
the  coin  is  dark,  the  polarizer  has  the  linear  polarizer  on  top.  This  is  because  the 
handedness  of  the  light  is  changed  on  reflection;  the  light  emerging  from  the  ^  plate 
is  now  linearly  polarized  perpendicular  to  the  axis  of  the  LP  and  no  light  escapes. 


A  circular  polarizer  is  a  sandwich  of  a  linear  polarizer  and  a  quarter-wave  plate. 


8.7  CRITICAL  ANGLE  -  TOTAL  INTERNAL  REFLECTION 
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8.7  Critical  Angle  —  Total  Internal  Reflection 

We  also  mentioned  this  phenomenon  during  the  discussion  of  the  Fresnel  equations. 
From  Snell,  we  have  the  relation: 

u-i  sin  [dx]  =  n2  sin  [92\ 

If  u-i  >  n2  then  a  specific  angle  9 1  satisfies  the  condition: 

—  sin  [0i]  =  1  =>-  sin  [9i\  =  —  <  1  =>-  02  =  ^ 
n2  rii  2 

which  means  that  the  outgoing  ray  is  refracted  parallel  to  the  interface  (“surface”). 
The  incident  angle  9 1  that  satisfies  this  condition  is  the  critical  angle  9C 


n  i 


For  crown  glass  with  nd  =  1.52,  the  critical  angle  is  sin-1  [y-y]  =  0.718  radians 
=  41°.  For  a  common  flint  glass  with  nd  =  1.70,  then  9C  =  0.629  radians  =  36°.  If  the 
incident  angle  0\  >  9C  and  ni  >  n2  (e.g.,  the  first  medium  is  glass  and  the  second  is 
air),  then  no  real- valued  solution  for  Snell’s  law  exists,  and  there  is  no  refracted  light. 
This  is  the  well-known  phenomenon  of  total  internal  reflection  -  all  of  the  incident 
light  is  reflected  at  the  interface. 


This  may  be  analyzed  rigorously  by  applying  Maxwell’s  equations  to  show  that  the 
refracted  angle  92  is  complex  valued  instead  of  real  valued,  so  that  the  electromagnetic 
field  is  attenuated  exponentially  as  it  crosses  the  interface.  In  other  words,  the  electric 
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field  decays  so  rapidly  across  the  interface  that  no  energy  can  flow  across  the  boundary, 
and  hence  no  light  escapes.  However,  we  can  “frustrate”  the  total  internal  reflection  by 
placing  another  medium  (such  as  another  piece  of  glass)  within  a  few  light  wavelengths 
of  the  interface.  If  close  enough  to  the  boundary,  then  some  electric  field  can  get  into 
the  second  glass  and  a  refracted  wave  “escapes” . 


Schematic  of  “frustrated  total  internal  reflection”:  some  energy  can  “jump”  across  a 
small  gap  between  two  pieces  of  glass  even  though  the  incident  angle  exceeds  the 
critical  angle.  As  the  width  r  of  the  gap  increases,  then  the  quantity  of  energy 
coupled  across  the  gap  decreases  very  quickly. 


Chapter  9 

Optical  Imaging 


9.1  Transition  from  Wave  Optics  to  Ray  Optics 

We  have  mentioned  that  the  rigorous  evaluation  of  light  upon  interaction  with  mat¬ 
ter  (as  in  diffracting  apertures,  mirrors,  or  lenses)  involves  the  solution  of  the  four 
equations  collected  by  Maxwell  subject  to  the  specific  conditions  of  the  problem.  The 
exact  solution  to  these  equations  is  often  very  difficult  to  obtain.  Fortunately,  it  is 
often  sufficient  to  find  approximate  solutions.  The  very  useful  approximation  relevant 
to  imaging  applications  is  the  model  of  light  as  rays ,  which  is  generally  called  geomet¬ 
rical  optics.  This  approximation  emphasizes  the  path  travelled  by  light  through  media 
to  find  the  locations  and  sizes  of  images.  The  other  model  of  optics  as  waves,  called 
physical  optics ,  emphasizes  the  “deviation”  or  “spreading”  of  light  from  the  geometri¬ 
cal  paths  to  create  interference  and / or  diffraction,  and  demonstrates  the  fundamental 
limitations  on  the  performance  (i.e.  ,  the  resolution)  of  optical  imaging  systems. 

Ray:  a  line  in  space  that  maps  the  direction  of  energy  flow.  It  is  a 
mathematical  construction,  not  an  actual  entity. 

The  geometrical  optics  approximation  is  a  limiting  case  of  the  more  general  wave 
optical  model  in  the  limit  that  the  wavelength  A  of  light  goes  to  zero.  As  we  shall  see 
later,  there  is  no  diffraction  in  the  wave  model  in  this  case. 

9.1.1  Notational  Conventions 

One  of  the  more  confusing  and  frustrating  aspects  of  geometrical  optics  is  the  exis¬ 
tence  of  multiple  notational  conventions.  These  notes  use  the  convention  of  directed 
distances,  which  is  also  used  by  Halliday  and  Resnik,  Jenkins  and  White,  Hecht, 
Nussbaum  and  Phillips,  Crawford,  Iizuka,  Goodman,  and  Gaskill.  The  other  com¬ 
mon  convention  us  based  on  a  coordinate  system  with  the  origin  at  the  first  vertex  of 
the  optical  system,  and  is  used  by  many  authors  for  lens  design,  e.g.,  Born  and  Wolf, 
Warren  Smith,  and  Ditchburn. 

A  powerful  advantage  of  the  directed  distance  convention  is  the  resulting  (and 
pleasing)  symmetry  between  objects  and  images,  and  because  the  nature  of  the  the 
resulting  images  is  obvious. 
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Positive 


Negative 


Distance 


a 


+  b 


-►  a 


b 

▲ 

Height 

c~ 


Angle 

(measured  from  normal) 


Curvatures 

(from  vertex  to  center) 


1.  Light  travels  from  left  to  right  (positive  direction); 

2.  Interfaces  (i.e.,  lens  or  mirror  surfaces)  are  numbered  from  the  first  to  the  last 
encountered  by  the  ray; 

3.  Distances  are  measured  from  lens  vertices ,  the  intersection  of  the  lens  surface 
with  the  axis  of  symmetry  (optical  axis); 

4.  A  horizontal  distance  is  positive  if  measured  from  left  to  right ; 

5.  A  vertical  distance  from  the  axis  is  positive  if  measured  “up ” ; 

6.  Angles  are  measured  from  the  optical  axis  or  from  the  normal  to  the  surface 
and  are  positive  if  in  the  counterclockwise  direction  (the  normal  convention  for 

oy 

7.  A  radius  of  curvature  is  positive  if  the  center  is  to  the  right  of  the  vertex; 

8.  A  subscript  on  a  quantity  corresponds  to  the  surface  with  which  is  it  associated; 

9.  If  used,  primed  quantities  (e.g.,  n')  refer  to  the  “outgoing”  side  of  an  interface. 
These  are  useful  when  describing  a  multiple  element  system  where  the  output 
(image)  space  for  one  element  is  the  input  (object)  space  for  the  next  element. 


9.1  TRANSITION  FROM  WAVE  OPTICS  TO  RAY  OPTICS 
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9.1.2  Fermat’s  Principle 

Hero  of  Alexandria  hypothesized  the  model  of  light  propagation  that  could  be  called 
the  principle  of  least  distance: 

A  ray  of  light  traveling  between  two  arbitrary  points 
traverses  the  shortest  possible  path  in  space. 

This  statement  applies  to  reflection  and  transmission  through  homogeneous  media 
(i.e.,  the  medium  is  characterized  by  a  single  index  of  refraction).  However,  Fermat’s 
principle  is  not  valid  if  the  object  and  observation  points  are  located  in  different  media 
(i.e.,  the  normal  situation  for  refraction)  or  if  multiple  media  are  present  between  the 
points. 

In  1657,  Pierre  Fermat  modified  Hero’s  statement  to  formulate  the  principle  of 
least  time: 


A  light  ray  travels  the  path  that  requires  the  least  time  to  traverse. 


The  laws  of  reflection  and  refraction  may  be  easily  derived  from  Fermat’s  principle. 
A  moving  ray  (or  car,  bullet,  or  baseball)  traveling  a  distance  s  at  a  velocity  v  requires 
t  seconds: 


s 

t  =  — 
v 

If  the  ray  travels  at  different  velocities  for  different  increments  of  distance,  the  travel 
time  may  be  written  as: 


<  =  £ 

m=  1 


Sm 


We  know  that  the  velocity  of  a  light  ray  in  a  medium  of  index  n  is  v  =  so  that: 


‘  =  £ 

m=  1 


i  M 


m=  1 


Thus  the  time  traveled  to  traverse  the  path  through  the  medium  is  equal  to  the  time 
required  to  travel  a  longer  path  £  in  vacuum;  the  path  is  longer  because  nm  >  1. 
This  longer  path  £  =  ns  is  called  the  optical  path  length.  This  means  that  the  light 
requires  the  least  time  to  traverse  the  path  with  the  shortest  optical  path  length.  The 
principle  of  least  time  may  be  reworded  as: 


A  ray  traverses  the  route  with  the  shortest  optical  path  length. 
This  result  may  be  derived  from  Maxwell’s  equations. 


Fermat’s  Principle  for  Reflection 

Now  consider  the  path  traveled  upon  reflection  that  minimizes  an  easily  evaluated 
optical  path  length: 
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P 


Schematic  for  determining  the  angle  of  reflection  using  Fermat ’s  principle 


As  drawn,  the  angle  ()\  is  positive  (measured  from  the  normal  to  the  ray)  and  O2  is 
negative  (from  the  normal  to  the  ray).  The  ray  travels  in  the  same  medium  of  index 
n  both  before  and  after  reflection.  The  components  of  the  optical  path  length  are: 

so  =  y/  h2  +  x 2 
op  =  \Jb2  +  (a  —  x )2 

And  the  expression  for  the  total  optical  path  length  £  is: 

£  =  n  (so  +  op) 

=  n  ^ Vh 2  +  x2  +  \Jb2  +  (a  — .  ;r)2^ 
which  is  a  function  of  x 

By  Fermat’s  principle,  the  path  length  traveled  is  the  minimum  of  of  the  optical  path 
length  £,  so  the  position  of  o  may  be  find  by  setting  the  derivative  of  £  with  respect 
to  x  to  zero: 

d£  (  2x  —2  (a  —  x)  \ 

dx  \2\/h2  +  x?  2\Jb2  +  (a  —  x)2  J 

x  a  —  x 

y/h2  +  x2  v /b 2  +  (a  -  x)2 

x  a  —  x 

^  VWTx2  Jp  +  (a_x) 


2 
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from  the  drawing,  note  that 
x 


sin  [9 1  = 


sin  [— 6b  1  = 


Vh2 


x * 


a  —  x 


\Jh2  +  (a  -  x)“ 
sin  [Of  =  sin  [— 92] 
=k  —9i  =  @2 


In  words,  the  magnitudes  of  the  angles  of  incidence  and  reflection  are  equal  (as  already 
derived  by  evaluating  Maxwell’s  equations  at  the  boundary).  The  negative  sign  is 
necessary  because  of  the  sign  convention  for  the  angle;  the  angle  is  measured  from 
the  normal  and  increases  in  the  counterclockwise  direction. 


Fermat’s  Principle  for  Refraction: 


Schematic  for  refraction  using  Fermat’s  principle. 


In  this  drawing,  both  ()\  and  02  are  positive  (measured  from  the  normal  to  the 
ray).  The  optical  path  length  is: 


=  m  so  +  n2  op 


=  niVh 2  +  x2  +  n2Jb2  +  (a  —  x)“ 
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By  Fermat’s  principle,  the  path  length  traveled  is  the  minimum  of  l.  so  we  again  set 
the  derivative  of  i  with  respect  to  x  to  zero: 


d£_ 

dx 


2x 


=  n i 


-2  (a  —  x) 


2  VWT^  712  2  ^62  +  (a-xf 


x 


a  —  x 


n  i 


=  n2 


W  +  x2  ^Jb2  ~  (a  _ 


sin  [0i]  = 
sin  [$2]  = 


x 


vw+ 


XA 


a  —  x 


\Jb2  +  (a  -  x)2 

>  n\  sin  [0i]  =  n2  sin  [02\ 

>  Snell’s  Law  for  refraction 


0 

0 


Note  that  with  this  sign  convention,  Snell’s  law  may  be  applied  to  reflection  by  setting 
the  refractive  index  of  the  second  medium  to  be  the  negative  of  the  first: 


ni  sin  [6 1 


=  n2  sin  [d2] 
=>•  n\  sin  [0]J 
=>•  —  sin  [6fl] 


=  —  n\  sin  [02] 
=  sin  [02\ 


- e i 


Chapter  10 


Image  Formation  in  the  Ray  Model 


We  know  that  light  rays  are  deviated  at  interfaces  between  media  with  different 
refractive  indices.  The  goal  in  this  section  is  to  use  interfaces  of  specified  shapes 
to  “collect”  and  “redirect”  rays  (or  to  “collect”  and  “reshape”  wavefronts  that  are 
normal  to  the  family  of  rays)  in  such  a  way  to  create  “images”  of  the  original  source(s). 


10.1  Refraction  at  a  Spherical  Surface 


Refraction  at  a  spherical  surface  bounding  media  with  refractive  indices  ri\  and  n2. 

Consider  a  point  source  located  at  o.  The  distance  to  the  vertex  v  is  ov  =  si  >  0 
as  drawn.  The  distance  from  vertex  v  to  the  point  p  is  vp  =  s2  >  0.  The  distance 
traveled  by  a  ray  in  medium  n\  to  the  surface  is  oa  =  t\  and  the  distance  in  medium 
n2  is  ap  =  t2-  The  radius  of  curvature  of  the  surface  is  vc  =  ac  =  R  >  0.  For 
emphasis,  we  repeat  that  s1;  s2,  and  R  are  all  positive  in  our  convention.  The  ray 
intersects  the  surface  at  the  “position  angle”  measured  from  the  center  of  curvature 
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c  to  a.  The  optical  path  length  of  the  ray  from  o  to  p  that  passes  through  a  is 
OPL  =  riiii  +  Ti’2^2-  We  will  evaluate  this  length  by  considering  the  triangles  Aoac 
and  Aacp;  the  hypotenuses  oa  and  ap  may  be  evaluated  by  applying  the  law  of 
cosines: 

Aoac  =>•  |oa|"  =  |oc|2  +  |ac|2  —  2  |oc|  |ac|  cos  [ip\ 
i\  =  (si  +  R )2  +  R2  —  2R  (si  +  R)  cos  [ip\ 

i\  =  \j (si  +  R)2  +  R?  —  2 R  (si  +  R)  cos  [ip\ 

Aacp  =>•  | ap | 2  =  | ac | 2  +  |cp|2  —  2  |ac|  |cp|  cos  [n  —  </?] 
il  =  (s2  —  R)2  +  R2  —  2R  (s2  —  R)  cos  [7 r  —  (p\ 

(-2  =  \J ($2  —  R)  +  R2  +  2 R  (s2  —  R)  cos  [<p] 

=  \j (s2  -  R)2  +  R2  -  2R  (R  -  s2)  cos  [ip\ 

Therefore  the  optical  path  length  is: 

OPL  =  n  1^1  +  n2t  2 

=  rii  •  (y\j (-si  +  R )2  +  R2  —  2 R  ( R  +  si)  cos  [<p]^ 

+  n2  ■  ( \J (s2  -  R)2  +  R2  -  2R  (R  -  s2)  cos 


10.1  REFRACTION  AT  A  SPHERICAL  SURFACE 
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which  is  obviously  a  function  of  the  “position  angle”  p.  We  can  now  apply  Fermat’s 
principle  and  evaluate  the  angle  p  for  which  the  OPL  is  a  minimum: 


dp 


{OPL)  =  0 


ni  •  2 R  {R  +  sin  [p 


7*2  •  2 R  {R  —  s2)  sin 


J (si  +  R)2  +  R2  —  2 R  (R  +  si)  cos  [p]  \J (s2  —  R)2  +  R2  —  2 R  (R  —  s2)  cos 

7*!  (R  +  Si) 


=  2 R  sin  [9?] 
+  2 R  sin  [p] 


=  2R  sin  [p 


\J (si  +  R)2  +  R2  —  2 R  {R  +  si)  cos 
_ n2  (R  -  s2) _ 

(s2  —  R)2  +  R2  —  2 R  (R  —  s2)  cos  [ 
7*1  (R  +  si)  n2  (R  -  s2)  \ 


ri  r2 

7*.i  (R  +  Si)  n2  (R  ~  s2) 


=  0 

7*i-R  7*2 -R  ?*.2S2  7*.iSi 


!il  1  52 
€1  ^ 


_  »1  Si  ) 

R  V  h  tl  ) 


for  one  spherical  surface 


This  relation  between  the  physical  path  lengths  i\  and  i2  and  the  distances  si  and  s2 
is  exact.  Now  we  now  identify  the  ratio  of  the  physical  path  length  i\  from  o  to  a  to 
the  axial  distance  si  from  o  to  the  surface  vertex  v: 


h 

S\ 


\  f  (si  +  R)2  +  R2  —  2 R  (si  +  R)  cos  [p\ 


^  (si  +  R)2  +  R2  —  2 R  (si  +  R)  cos  [p]  ^  2 

/sf  +  R2  +  2Rsi  +  R2  —  2 R2  cos  [p]  —  2Rs\  cos  [ p\  ^ 


(l-cos[^])^ 


1 

2 


This  relation  also  is  exact. 
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10.1.1  Paraxial  Approximation 

We  can  expand  the  cosine  of  the  position  angle  into  its  Taylor  series: 

p2  p4  p6 

~sM  =  i-¥  +  ¥-¥  +  .-- 

If  the  angle  p  is  very  small  (so  that  the  ray  remains  close  to  the  axis),  we  can 
approximate  the  cosine  by  truncating  the  series  after  the  first  term: 

if  p  =  0  =>-  cos  [p]  =  1  =>-  1  —  cos  [p]  =  0 


^  1 


Sl 


=  Si,  and  92  —  s2 


In  this,  the  paraxial  approximation  (the  resulting  approximate  calculations  are  called 
first-order  optics  or  Gaussian  optics ),  we  have: 

1  ( n2s2  nisA  ^  1  ,  , 

R\-JT~—)  =  R^~n') 


f  +=f  =  i("2-n,) 


paraxial  imaging  equation  for  single  spherical  surface 


Snell’s  Law  in  the  Paraxial  Approximation 

Recall  Snell’s  law  that  relates  the  ray  angles  before  and  after  refraction: 

ni  sin  [0^  =  n2  sin  [02] 

In  the  paraxial  approximation  where  9  =  0,  the  refraction  equation  simplifies  to: 

riiOi  =  n202  =>  7T  =  —  =>  62  =  —0i 
9 1  n2  n2 

In  words,  the  ray  angle  after  refraction  is  proportional  to  that  before  refraction. 


Power 

The  value  of  a  lens  or  lens  system  is  due  to  its  ability  to  “redirect”  rays  by  changing 
their  direction.  This  capability  is  described  by  the  power  of  the  lens;  a  lens  system 
with  a  large  power  changes  the  angles  of  rays  by  a  large  amount.  A  lens  system  that 
does  not  change  the  angles  of  rays  has  zero  power.  Most  people  are  more  familiar 
with  the  concept  of  “focal  length” ,  which  is  the  reciprocal  of  the  power.  The  power  is 
measured  in  terms  of  the  reciprocal  length;  if  measured  in  meters,  the  units  of  power 
are  diopters. 
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Object-  and  Image-Space  Focal  Lengths 


Now  consider  some  pairs  of  object  and  image  distances  s,  and  s2.  If  the  object  is 
located  at  —  oo,  then: 


ni 

oo 


S  2  S  2  R  V 


n  i 


«2  = 


n2R 


=  h 


n2  -  rti 

the  “image-space  focal  length  ”  of  the  single  surface 


If  the  image  is  located  at  +oo,  we  have: 


n±R 
n2  -  ri  i 


=  /i 


the  “object-space  focal  length ”  of  the  single  surface 


Also  note  that: 

(  ni  R  A 

\ri2-ni  J  Tli 

f  n2R  A  n2 

y  n2—n\  J 

In  words,  the  ratio  of  the  object-space  and  image-space  focal  lengths  of  the  single 
surface  between  two  media  equals  the  ratio  of  the  indices  of  refraction. 

To  summarize,  the  assumptions  of  paraxial  optics  reduce  the  exact  trigonometric 
expressions  for  ray  heights  and  ray  angles  to  zero.  The  resulting  expressions  are 
accurate  ONLY  within  an  infinitesmal  region  centered  on  the  optical  axis  of  symmetry 
(the  imaginary  line  through  the  centers  of  curvature  of  the  surfaces  of  the  system). 
Though  limited  in  its  descriptive  accuracy  of  the  properties  of  the  system,  the  paraxial 
approximation  results  in  a  set  of  simple  equations  that  are  accurate  for  locating  the 
axial  positions  of  images.  However,  because  extended  objects  consist  of  point  sources 
at  various  positions  in  space  distant  from  the  optical  axis  (and  thus  do  not  fulfill 
the  requirements  of  the  paraxial  approximation),  the  quality  of  the  image  cannot  be 
inferred  from  this  description.  “Defects”  in  the  image  are  due  to  “aberrations”  in 
the  optical  system,  which  are  deviations  from  the  paraxial  approximation  where  the 
output  angle  is  proportional  to  the  input  angle. 


fi 

h 


10.1.2  Nature  of  Objects  and  Images: 

1.  Real  Object:  Rays  incident  on  the  lens  are  diverging  from  the  source;  the 

object  distance  is  positive: 
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2.  Virtual  Object:  Rays  converge  toward  the 
lens;  object  distance  is  negative: 


‘source”,  which  is  “behind”  the 


3.  Real  Image:  Rays  converge  from  the  lens  toward  the  image;  image  distance 
is  positive: 


4.  Virtual  Image:  Rays  diverge  from  the  lens,  so  that  the  “image”  is  behind  the 
lens;  the  image  distance  is  negative: 


10.2  IMAGING  WITH  LENSES 


161 


10.2  Imaging  With  Lenses 


Normally  we  do  not  consider  the  case  of  an  object  in  one  medium  with  the  image 
in  another  -  usually  both  object  and  image  are  in  air  and  a  lens  (a  “device”  with 
another  index  and  two  usually  curved  surfaces)  forms  the  image.  We  can  derive  the 
formula  for  the  object  and  image  distances  if  we  know  the  radii  of  the  lens  surfaces 
and  the  indices  of  refraction.  We  merely  cascade  the  paraxial  formulas  for  a  single 
surface: 


At  first  surface:—  +  — y 
s  i  s\ 

At  second  surface:—  +  ~r 
s  2  s'2 


n2  -  ni 
Ri 


n3  -  n2 
R2 


where  si  is  the  (usually  known)  object  distance,  sj  is  the  image  distance  for  rays 
refracted  by  the  first  surface,  s2  is  the  object  distance  for  the  second  surface,  and  s'2 
is  the  image  distance  for  rays  exiting  the  second  surface  (and  thus  from  the  lens).  For 
the  common  “convex-convex”  lens,  the  center  of  curvature  of  the  lens  is  to  the  right 
of  the  vertex,  and  thus  the  radius  R\  of  the  first  surface  is  positive.  Since  the  vertex 
is  to  the  right  of  the  center  of  curvature  of  the  second  surface,  R2  <  0.  If  the  lens  is 
“thin” ,  then  the  ray  encounters  the  second  surface  immediately  after  refraction  at  the 
first  surface,  so  the  magnitude  of  the  image  distance  for  the  front  surface  |  .sj  is  the 
same  as  the  object  distance  for  the  second  surface  |  .s 2 1  -  From  the  directed  distance 
convention,  s2  =  —s[.  If  the  lens  is  “thick”,  then  |sj|  7^  |s2|,  so  we  define  the  thickness 
t  to  satisfy: 

s[  +  s2  =  t  =>  s[  =t  -  s2 


To  find  the  single  imaging  equation,  we  add  the  equations  for  the  two  surfaces: 

(MMMM  t)*(”) 

_  ^3  / J- _ l  \  _  nj_ 

~  r2+U2\r1  R2)  R1 

For  a  thin  lens  (t  =  0),  substitute  —  s2  for  sj 


where  the  object  is  in  index  ni,  the  lens  has  index  n2,  and  the  image  is  in  index  n3. 


In  the  usual  case,  the  object  and  image  are  both  in  air,  so  that  n3  =  ri\  =  1. 
The  simplified  expression  of  the  power  of  a  thin  lens  is  encapsulated  in  the  so-called 
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lensmaker’s  equation: 


1  1_1  (  1  M  1 

Si  S2  -^2  V-Rl  R2  J  Rl 

(P=  7  =  (n2  -  !)  (it  ~  A) 

which  defines  the  focal  length  of  a  lens  with  known  index  and  surface  radii.  Note  that 
the  object  distance  si  and  the  image  distance  s'2  both  “appear”  on  the  left  side  with 
the  same  algebraic  sign;  this  may  be  interpreted  as  demonstrating  an  “equivalence” 
of  the  object  and  image.  The  reversibility  of  rays  implies  that  the  roles  of  object 
and  image  may  be  exchanged.  The  object  and  image  may  be  considered  to  be  maps 
of  each  other,  with  the  mapping  function  defined  by  the  lens.  Corresponding  object 
and  image  points  (or  object  and  image  lines  or  object  and  image  planes)  are  called 
conjugate  points  (or  lines  or  planes). 


10.2.1  Examples: 


1.  Plano-convex  lens,  curved  side  forward: 


Ri 

R‘l 

1  1 


Si  s'2 
If  Si 


1 

7 

/ 


\Ri\>  0 

±00  (sign  has  no  effect) 

(n2  -  1) 


( 1 

1  n2  ~ 

1 

\\Ri 

OO  ) 

Ri 

>  0 

+00,  then  s'2  =  f>  0,  the  focal  length 
n2  -  1 


Ri 

Ri 


=  p  the  power  (measured  in  m  1  =  diopters) 


n2 


-  =  2R1  (since  n2  =  1.5  for  glass) 


We  often  use  the  “power”  p  =  /  1  to  describe  the  lens,  since  powers  add 
(simpler  than  adding  reciprocals  of  focal  lengths). 


.  Plano-convex  lens,  plane  side  forward: 


1 


Ri  =  ±00 
R2  =  -  \R2\  <  0 
1  =  (n2  -  1)  =  (w 2  -  1) 

s2  R2  |  R2 1 


/ 


\r2\ 

n2  -  1 


=  2  |  R2 


>  0 


2 
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3.  Plano-concave,  plane  side  forward: 


R\  =  ±oo 

1?2  =  +  I-R2I  >  0 


1  1  , 

—  +  -7  -  (n2  -  1) 


(- 


f  = 


I R* 


n2  -  1 


—  —2  \Ro 


(^2  ~  1) 

1^1 


<  0 


4.  Double  convex  lens  with  equal  radii: 


Ri  =  \R\  >  0 

R2  =  —Ri  =  ~~  |-R| 

1  1 

Si 


=  2 


(n2  -  1) 
|i?| 


>  0 


1 

l~Lp~ 


2  •  (n2  -  1) 


\R\ 


f  = 


|i?| 


2  •  (n2  -  1) 


=  \R\  >  0  (since  n2  =  1.5) 


/  =  \R\  for  an  equiconvex  glass  lens 


10.3  Magnifications 

The  most  common  use  for  a  lens  is  to  change  the  apparent  size  of  an  object  (or 
image)  via  the  magnifying  properties  of  the  lens.  The  mapping  of  object  space  to 
image  space  “distorts”  the  size  and  shape  of  the  image,  i.ev  some  regions  of  the 
image  are  larger  and  some  are  smaller  than  the  original  object.  We  can  define  two 
types  of  magnification:  transverse  and  longitudinal. 

10.3.1  Transverse  Magnification: 

The  transverse  magnfication  MT  is  what  we  usually  think  of  as  magnification  -  it  is 
the  ratio  of  object  to  image  dimension  measured  transverse  to  the  optical  axis: 
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From  the  similar  triangles  Aaib|C  and  Aa2b2c,  we  see  that: 


y_ 

Sl 

if  |  Mr  | 
if  |  Mr  | 

Mt 
Mj* 

Consider  the  case  where  the  object  is  located  at  the  lens,  so  that  .Si  =  0.  From  the 
imaging  equation,  we  can  find  s2: 

s2=(H) 


1 2/2 1  2/2  ,  ^  n 

=  1 —  = - because  y2  <  0 

S2  S2 


y  i  si 

>  i,  the  image  is  magnified 

<  1,  the  image  is  minified 

<  0  ==>  the  image  is  inverted 

>  0  =>•  the  image  is  upright  (or  erect ) 


In  words,  the  image  distance  also  is  0.  The  transverse  magnfication  is  not  well  defined 
from  the  equation,  but  the  distances  show  that  object,  lens,  and  image  all  coincide, 
which  leads  to  the  observation  that  MT  =  +1. 


10.3.2  Longitudinal  Magnification: 


The  longitudinal  magnification  Ml  is  the  ratio  of  the  length  of  the  image  along  the 
optical  axis  to  the  corresponding  length  of  the  object.  Since  the  inverse  distances  are 
related  in  the  paraxial  imaging  equation  (sj^and  A  1  )•  the  longitudinal  magnification 
varies  for  different  object  distances.  The  longitudinal  magnification  is  the  ratio  of 
differential  (infinitesmal)  elements  of  length  of  the  image  and  object: 


Ml  = 


A  s2 

Asi 


ds2 

dsi 


If  evaluated  at  a  single  on-axis  point  (so  that  A.si  — >  0),  then  the  infinitesmal  quanti¬ 
ties  are  related  and  the  longitudinal  magnification  is  the  derivative  of  the  image  size 
relative  to  the  object  size: 


Ml 


si 


ds2 
ds  i 
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The  expression  may  be  derived  by  evaluating  the  total  derivative  of  the  object  and 
image  distances: 


d 


cl 


1  1  ,  .  ( 1  1  \ 
d  1  —  1  —  0  (because  the  terms  on  the  RHS  are  constants) 


=  -  (Mt)2  <  0 


The  longitudinal  magnification  oif  a  positive  lens  is  negative  because  the  image  moves 
away  from  the  lens  (increasing  s2)  as  the  object  moves  towards  the  lens  (decreasing 
Si).  The  longitudinal  magnification  also  affects  the  irradiance  of  the  image  (i.e.,  the 
“flux  density”  of  the  rays  at  the  image).  If  \ML\  is  large,  then  the  light  in  the  vicinity 
of  an  on-axis  location  is  “spread  out”  over  a  larger  region  of  space  at  the  image,  so 
the  irradiance  of  the  image  is  decreased. 


y 


The  scaling  of  the  3-D  “image”  along  the  three  axes.  The  scaling  along  the 
“transverse”  axes  x  and  y  define  the  transverse  magnification,  while  the  scaling  of 
the  image  along  the  z-axis  is  determined  by  the  longitudinal  magnification. 
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located  at  si  =  2/  is  imaged  with  unit  transverse  magnification  at  S2  =  2/.  Sections 
of  the  rod  with  si  >  2/  are  imaged  “closer”  to  the  lens  (s2  <  2/  ),  and  the  the 
energy  density  is  remapped  to  account  for  the  nonlinear  distance  relationship 

i  i  i  _  i 


SI  s2  /' 


10.4  Spherical  Mirrors 

Consider  a  mirror  with  a  spherical  reflective  surface.  We  can  define  a  power  of  the 
surface  that  is  analogous  to  the  power  (and  thus  related  to  the  focal  length)  of  a  thin 
lens.  A  ray  that  passes  through  the  center  of  curvature  of  the  spherical  mirror  must 
intersect  the  surface  at  right  angles,  so  that  the  incident  angle  measured  relative  to 
the  normal  is  9  =  0.  The  reflected  angle  also  is  9  =  0  and  thus  the  reflected  ray 
also  passes  through  the  center  of  curvature.  A  ray  that  crosses  the  optical  axis  at  O 
such  that  OC  >  0  makes  an  angle  of  —0  A  0  with  the  surface  normal.  It  therefore 
is  reflected  at  an  angle  +9  measured  relative  to  the  normal  and  intersects  the  axis  at 
O'  such  that  O'C  =  —CO  A  0.  A  paraxial  ray  from  an  object  at  oo  makes  a  small 
angle  —Ofi  0  and  reflects  at  + 9  A  0.  The  reflected  paraxial  ray  intersects  the  optical 
axis  at  a  distance  VF'  =  CV  •  |  =  f  •  From  the  triangle 


Ray  redirection  by  a  spherical  mirror:  (a)  an  incoming  ray  through  the  center  of 
curvature  C  produces  an  exiting  ray  that  also  passes  through  C  because  the  ray 
angles  are  0;  (b)  a  ray  that  just  misses  C  intersects  the  mirror  at  a  ray  angle  of  — 9 
and  produces  a  ray  at  an  angle  of  +9  that  misses  C  to  the  opposite  side;  (c)  a 
paraxial  ray  from  an  object  at  infinity  intersects  the  surface  at  an  angle  —9  =  0  and 
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is  reflected  at  + 9  to  cross  the  optical  axis  at  F',  which  is  halfway  between  C  and  V, 
which  shows  that  f  =  —  §  for  paraxial  rays. 

Now  consider  the  case  where  a  nonparaxial  ray  from  an  object  at  oo  reaches  the  mirror 
and  makes  a  large  angle  measured  from  the  surface  normal.  The  ray  is  reflected  at 
the  same  large  angle  0  and  intersects  the  optical  axis  at  O'  as  shown. 


Ray  diagram  that  illustrates  spherical  aberration  of  a  spherical  mirror.  The 
nonparaxial  ray  intersects  the  mirror  at  P  and  makes  an  angle  of  —9  measured 
from  the  radius  from  C.  The  image  distance  V'F'  =  f ,  which  varies  with  the  ray 

height  h. 


The  distance  CF'  may  be  evaluated  from  APF'C  using  the  law  of  sines: 


R 


CF' 


sin  [-7T  —  26]  sin  [9] 


CF'  =  R 


sin  [6] 


sin  [7 r  —  29] 


V'F'  =  /  =  CF'  —  R  <  0 


f  =  R 


An 


1 


p  sin  [6] 

sin  [29] 


We  can  evaluate  this  in  the  limit  sin  [9]  — >  9,  sin  [29]  — >  29  to  get  the  focal  length  of 
the  mirror  in  the  paraxial  case: 


paraxial  case\f  —■ >  R  —  l)  =  — ^ 

The  paraxial  focal  length  of  a  spherical  mirror  is  /  =  —  where  the  negative  sign 
corrects  for  the  observation  that  a  spherical  with  R  <  0  makes  light  converge  and 
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thus  has  positive  power.  The  focal  length  of  a  spherical  mirror  can  be  put  in  the  same 
form  as  a  refracting  surface  by  setting  the  second  index  of  refraction  n2  =  —  1 


1 

7 


2 

R 


(n2  -  ni) 
R 


n  i  =  +1  and  n2 


R  <  0 
R  >  0 


concave  mirror  =>•  /  >  0 
convex  mirror  =>•  /  <  0 


-1 


Note  that  the  refractive  index  of  the  medium  has  no  effect  on  the  power  of  a  spherical 
lens  because  Snell’s  law  for  reflection  does  not  include  any  contribution  by  n.  In  other 
words,  the  reflected  ray  angle  is  not  affected  by  the  index  of  refraction  of  the  medium 
“ahead  of’  the  mirror. 

Consider  an  example  of  a  large  angle,  e.g.,  9  =  | .  The  focal  length  for  this  ray  is: 


sin  [f  ] 

sin  [f  ] 


—0.4231? 


In  words,  the  focal  point  of  a  spherical  lens  for  paraxial  rays  is  different  from  the 
focal  point  for  nonparaxial  rays,  which  means  that  rays  from  the  same  object  that 
are  collected  at  different  ray  angles  do  not  converge  to  a  sharp  focus,  thus  degrading 
the  quality  of  the  image.  This  effect  is  spherical  aberration.  Probably  the  most 
famous  example  of  an  imaging  mirror  with  spherical  aberration  is  the  Hubble  Space 
Telescope  before  adding  the  COSTAR  optical  corrector  system. 


Rays  from  an  object  at  infinity  at  different  ray  angles  ( or  equivalently,  at  different 
ray  heights  at  the  mirror)  cross  the  optical  axis  at  different  locations.  The  paraxial 
rays  intersect  the  optical  axis  at  F'  such  that  f  =  —  but  the  focal  point  moves 

toward  V  as  the  ray  height  increases. 

A  graph  if  the  focal  length  for  different  ray  angles  is  shown: 
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Ray  Angle  9  (degrees) 

The  variation  in  focal  length  V'F'  =  /  =  CF'  —  R  with  incident  ray  angle  6  over  the 
interval  0  <  9  <  showing  the  decrease  in  the  focal  length  with  increasing  angle. 
This  means  that  the  focus  for  rays  making  a  larger  angle  is  positioned  closer  to  the 

vertex  of  the  mirror. 


Spherical  mirrors  can  create  a  good  image  of  an  object  at  the  center  of  curvature 
C,  though  this  is  not  very  useful  because  the  image  also  is  located  at  C.  As  the  object 
distance  increases,  the  spherical  aberration  of  the  mirror  becomes  more  apparent,  so 
that  the  aperture  diameter  of  a  spherical  mirror  typically  must  be  small  to  obtain 
good  image  quality.  If  the  object  is  located  at  oo,  a  spherical  mirror  cannot  give  good 
quality  unless  the  aperture  size  is  quite  small  compared  to  the  radius  of  curvature. 
Mathematically,  it  is  easy  to  show  that  the  appropriate  mirror  shape  for  imaging  at 
infinity  is  a  paraboloid:  parallel  rays  from  an  object  at  oo  reflect  from  a  paraboloidal 
surface  and  converge  to  a  single  image  point. 


10.5  Systems  of  Thin  Lenses 


The  images  produced  by  systems  of  thin  lenses  may  be  located  by  finding  the  “inter¬ 
mediate”  images  produced  by  the  individual  lenses,  which  then  become  the  objects 
for  the  next  lens  in  the  sequence.  This  type  of  analysis  also  is  directly  applicable  to 
the  “thick”  lens  where  the  surfaces  take  the  places  of  the  individual  thin  lenses.  The 
object  is  labelled  by  O  and  the  corresponding  image  by  O',  the  object-  and  image- 
space  focal  points  are  F  and  F',  and  the  object-  and  image-space  vertices  (first  and 
last  surfaces  of  the  system)  by  V  and  V'. 
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Imaging  by  a  system  of  two  thin  lenses  Li  and  L2  separated  by  the  distance  d.  The 
object  and  image  distances  for  the  first  lens  are  s\  and  and  for  the  second  lens 

are  s2  and  s2 . 


In  this  particular  case,  the  lenses  are  separated  by  the  distance  t  and  the  object 

distance  for  the  second  lens  s2  =  t  —  s\ .  The  imaging  equation  for  the  first-  lens 

determines  sj: 

i  i  _  i  _  i  _  i  i  _  si  -  h 

Si  s'l  fl  s'l  fl  Si  Si/i 

,  /  _  Si/i 

Sl  S!  -  h 

So  the  object  distance  to  the  second  lens  is  s2: 


s2  =  t  —  sj  =  t  — 


Sl/l 


si  -  fi 
sit  -  fit  -  Si/i 
si  -  fi 

si  ( t  -  fi)  -  fit 
si  -  fi 
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Now  apply  the  imaging  equation  to  the  second  lens  and  substitute  for  s2: 

1  1_1  1_1  1 

s2  s2  f2  s'2  f2  s2 

- 1  _  Sl  ~  A 

A  si  (t  -  A)  -  fit 
=  N  ft  -  A)  -  fit  -  A  (gl  -  /l) 

A  («i  (*  -  /l)  -  A*) 

_  t  (gi  ~  A)  ~  gi  (A  ~  A)  +  A  A 
A  ([t  («i  -  A)]  -siA) 

_=  gi  (t  -  A)  ~  dh  -  h  (gi  ~  A) 

A  ([*  («i  -  A)]  -  siA) 

u  -  a 

2  7 

This  complicated  expression  determines  the  image  distance  from  the  second  lens  given 
the  focal  lengths,  the  “thickness”  (distance  between  the  lenses),  and  the  object  dis¬ 
tance  Si. 


10.5.1  Back  Focal  Distance 

We  would  like  to  collect  the  results  into  a  single  simple  equation  that  is  analogous 
to  the  imaging  equation  for  the  single  thin  lens.  The  back  focal  distance  BFD  is  s2 
for  sx  =  +00.  In  other  words,  it  is  the  distance  from  the  image-space  vertex  to  the 
image-space  focal  point:  BFD  =  V'E7.  Note  that  it  is  NOT  the  focal  length  of  the 
system.  We’ll  further  analyze  the  difference  between  fejf  and  BFD  shortly. 

BFD  = 

fit  -  (jpj)  fih 

~  (t-  h)  -  (*)  h 

lim  s2  = 

Sp  — >00 

Note  that  if  t  =  A  +  A  then  the  BFD  is  +00,  so  the  object  and  image  are  both  an 
infinite  distance  from  the  system.  Such  a  system  has  an  infinite  focal  length,  and 
thus  its  power  (reciprocal  of  the  focal  length)  is  zero. 


V'F'  =  lim  sl,  =  lim 


At  -  Wif 

Si-fi 
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10.5.2  Front  Focal  Distance 


Similarly,  the  front  focal  distance  (FFD)  is  si  if  s'2  =  oo  and  is  identical  to  FV.  It  is 
calculated  by  setting  the  denominator  of  the  expression  for  s'2  to  zero: 


if  ~  fa)  - 


gi fa 

Si  -  fl 


=  0 


slfl 
Si  -  fl 
Si 


=  t  -  fa 
t  -  fa 


Si  -  fl  fl 

Slfl  =  {t-  fa)  («1  -  fl) 

Slfl  =  l  S  |  —  tf 1  —  S I  fa  +  fif 2 
-Si  ifl  +  fa  —  t)  =  flfa  —  tfl 


lim  =  FV  = 


MtM  —  ffd 

t-(fl+f2)  -  r  r  u 


Note  that  this  expression  has  the  same  form  as  the  front  focal  distance  except  that  fa 
and  fa  are  “swapped;”  this  makes  intuitive  sense,  because  the  only  difference  between 
the  two  cases  is  that  the  two  lenses  are  exchanged. 


10.5.3  Thin  Lenses  in  Contact 


If  the  two  thin  lenses  are  in  contact,  then  t  =  0  and  the  focal  distances  are  equal  to 
the  focal  length  of  the  “equivalent  single  lens.”  We  can  calculate  its  value  by  setting 
t  =  0  in  the  equations  for  FFD  and  BFD: 


FFD 
BFD 

/u 


t= 0 


t= 0 


fl  (0  -  fa) 
o-ifa  +  fa) 
fa  (o  -  fi) 
o  —  ifi  +  fa) 
flfa 
fl  +  fa 


flfa 
fl  +  fa 
flfa 
fl  +  fa 


111, 

J  =  7+7D‘,t  =  0 
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H' 


Two  thin  positive  lenses  in  contact.  The  focal  length  of  the  system  is  shorter  than 
the  focal  lengths  of  either,  and  may  be  evaluated  to  see  that  feff  =  ■  The 

image-space  principal  point  is  the  location  of  the  “equivalent  thin  lens”. 


10.5.4  “Effective  Focal  Length”  of  a  System  with  Two  Lenses 


In  the  general  system  created  from  two  positive  lenses,  we  need  to  evaluate  the  focal 
length  /  (or,  equivalently,  the  power  p)  as  the  “thickness”  t  between  the  lenses  is 
increased  from  zero.  We  just  showed  that: 


lim 

t-^o 


1 

h 


where  we  are  assuming  that  the  two  focal  lengths  are  positive.  If  t  is  increased  from 
0  until  the  lenses  are  separated  by  the  sum  of  the  focal  lengths,  then  an  incoming  ray 
parallel  to  the  axis  exits  parallel  to  the  axis;  we  have  formed  a  “telescope.”  Since  the 
angles  relative  to  the  axis  of  the  incoming  and  outgoing  rays  have  not  changed,  then 
this  system  has  no  power  (y?  =  0);  its  focal  length  is  infinite: 


lim 


0 


feff  —  00 


Thus  the  focal  length  of  the  system  increased  (the  system  power  decreased)  as  t 
increased  from  zero.  This  leads  us  to  suspect  that  the  power  of  the  two-lens  system 
must  have  a  form  like: 


1 

h 


i 

+  —  at 
/  2 


where  a  is  some  constant  with  units  of  (length)  2.  Since  the  only  parameters  used 
in  the  system  are  f±  and  /2,  which  have  dimensions  of  length,  we  might  hypothesize 
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that  the  power  of  the  system  is: 


P 


P 


1  1  t 

Ti+rf2~  TJi 

Pi  +  If  2  ~  PlP2 1 

flh 

(fl  +  f 2)~  t 


The  effective  focal  length  of  the  system  is  the  reciprocal  of  its  power: 


Jeff  =  P 


fih 

( fi  +  h)  —  t 


It  can  be  interpreted  as  the  focal  length  of  the  single  thin  lens  that  generates  the  same 
outgoing  ray. 


The  power  p  is  measured  in  diopters  [nr-1]  if  all  distances  are  measured  in  m.  This 
expression  satisfies  the  two  limiting  cases  we  proposed.  If  the  two  lenses  have  positive 
power  and  the  separation  is  just  less  than  the  sum  of  focal  lengths,  the  effective  focal 
length  can  be  very  large.  This  is  also  the  case  if  if  one  of  the  two  lenses  has  negative 
power  (so  that  the  numerator  is  negative)  and  the  separation  is  just  larger  than  the 
sum  of  the  focal  lengths  (so  that  the  denominator  is  just  smaller  than  zero). 


10.5.5  Positive  Lenses  Separated  by  t  <  fl  +  f‘2 


If  two  positive  thin  lenses  are  separated  by  less  than  the  sum  of  the  focal  lengths, 
the  image-space  focal  point  F'  is  closer  to  the  first  lens  than  it  would  have  been  had 
the  second  lens  been  absent.  As  shown,  the  effective  focal  length  of  the  system  is 
feff  <  fi .  We  can  apply  the  equation  for  fefj  to  this  case  to  see  that: 


feff  ~ 


hh 


( h  +  h)~t 
0  <  feff  <  fl  +  h 


>  0 


As  just  stated,  the  effective  focal  length  fef  j  of  the  system  determines  the  location  of 
the  single  thin  lens  that  is  “equivalent”  to  the  system  in  image  space.  A  corresponding 
point  exists  in  object  space  that  will  be  located  next.  The  equivalent  thin  lens  has 
the  same  focal  length  fejf  as  the  two- lens  system  but  is  located  at  the  point  labeled 
by  H'  in  the  drawing.  H'  is  the  image-space  principal  point,  and  will  be  discussed  in 
more  detail  shortly. 
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A  pair  of  positive  thin  lenses  separated  by  less  than  the  sum  of  the  focal  lengths. 


Example: 


Consider  a  specific  example  with  fi  =  100  mm,  f2  =  50  mm,  and  t  =  75  mm.  The 
focal  length  of  the  equivalent  single  lens  is: 


feff  — 


/1/2 


( 100  mm)  (50  mm) 


(/1  +  f 2)  —  t  (100  mm +50  mm)  —  75  mm 
The  image  from  the  first  lens  is  formed  at  its  focal  point: 


200  2 

- mm  =  66  -  mm 

3  3 


si  = 


1 

h 


i 


-1 


1 


1 

00 


-1 


=  100  mm 


\  100  mm 

The  object  distance  to  the  second  lens  is  therefore  the  difference  t  —  s[: 

S2  =  t  —  s'i  =  (75  —  100)  mm  =  —25  mm 


The  image  of  an  object  located  at  sq 


1 

S2 


1 


1 


50  mm  —25  mm 


50  1C2 

—  mm  =16-  mm 
3  3 


measured  from  the  rear  vertex  V'  of  the  system.  We  already  know  that  the  system 
focal  length  is  66 1  mm,  so  the  image-space  principal  point  H;  (the  position  of  the 
equivalent  thin  lens)  is  located  66|  mm  IN  FRONT  of  the  system  focal  point,  i.e., 
50  mm  in  front  of  the  second  lens  and  25  mm  behind  the  first  lens.  We  can  locate  this 
point  by  continuing  the  object  ray  “forward”  through  the  system  and  the  image  ray 
“backward”  until  they  intersect;  this  is  the  location  of  the  equivalent  single  thin  lens 
that  creates  the  same  object  point  (this  location  is  H').  The  effective  focal  length  is 
the  distance  from  H  to  F'.  the  image  of  an  object  at  infinity. 
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Object- Space  Principal  Point 

We  have  already  shown  how  to  find  the  location  of  the  equivalent  single  lens  on  the 
“output  side”  by  extending  the  rays  entering  and  exiting  the  system  until  they  meet. 
We  can  locate  the  equivalent  single  lens  in  “object  space”  by  “reversing”  the  system, 
as  shown  in  the  figure.  The  “first”  lens  in  the  system  is  now  L2  with  /2  =  50  mm. 
The  “second”  lens  is  L  \  with  f\  =  100  mm  and  the  separation  is  t  =  75  mm.  The 
resulting  effective  focal  length  remains  unchanged  at  /e//  =  yp  mm  =  66 1  mm.  If  we 
bring  in  a  ray  from  an  object  at  oo,  the  “intermediate”  image  formed  by  L2  is  located 
at  the  focal  point  of  L2: 


Si  = 


1 

h 


i 

Si 


50  mm 


Thus  the  image  distance  to  L1  is: 

s2  =  t  —  Si  =  75  —  50  =  +25  nun 


The  image  of  the  object  at  S\  =  oo  produced  by  the  entire  system  is  located  at  s'2: 


— — M 

100  +25 ) 


100  1 

- =  —  33-  mm 

3  3 


measured  from  the  “second”  lens  L\  (or  equivalently  from  the  second  vertex).  The 
image  is  “in  front”  of  the  second  lens  (on  the  object-space  side)  thus  is  virtual.  The 
object-space  principal  point  H  is  the  point  such  that  the  distance  HV  =  /  =  66 1  mm, 
so  H  is  located  —33^  mm  IN  FRONT  of  L2. 


The  principal  and  focal  points  of  the  two-lens  imaging  system  in  both  object  and 

image  spaces. 


When  we  “re-reverse”  the  system  to  graph  the  object-  and  image-space  principal 
points,  H  is  located  “behind”  the  lens  L2,  as  shown  in  the  graphical  rendering  of  the 
entire  system:  The  object-space  principal  point  is  the  location  of  the  equivalent  thin 
lens  if  the  imaging  system  is  reversed. 
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Two-lens  system  showing  the  object-  and  image-space  principal  points  H  and  H7 

and  focal  points  F  and  F7. 


For  a  system  of  two  thin  lenses  in  contact,  the  principal  points  coincide  with  the 
common  location  of  the  two  lenses,  i.e.,  that  V7  =  H'  =  H  —  V. 

We  can  now  use  these  locations  of  the  equivalent  thin  lens  in  the  two  spaces  to  lo¬ 
cate  the  images  by  applying  the  thin- lens  (Gaussian)  imaging  equation.  HOWEVER, 
it  is  VERY  important  to  realize  that  the  distances  s  and  s'  are  respectively  measured 
from  the  object  O  to  the  object-space  principal  point  H  and  from  the  image-space 
principal  point  H'  to  the  image  point  O'. 

s  =  OH 

s'  =  WOJ 

The  process  is  demonstrated  after  first  locating  the  images  via  a  direct  calculation. 


Imaging  Equation  for  Equivalent  Single  Lens  (“Brute  Force”  Calculation) 

Now  consider  the  location  and  magnification  of  the  image  created  by  the  original 
two-lens  imaging  system  (with  L\  in  front)  for  an  object  located  1000  mm  in  front  of 
the  system  (so  that  OV  =  1000  mm).  We  can  locate  the  image  step  by  step: 


Intermediate  image  created  by  Lx\ 


1  1 
100  ~~  1000 


mm  =  111.11  mm 
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Transverse  magnification  of  first  imager. 

'1000 


mm 


s'  (]- 

(MA  =  -11  =  -i - 

1  Si  1000  mm 


1 

’9 


Distance  from  first  image  to  L2  : 

,  ,  1000  325  _  oc11 

s2  =  t  —  sl  =  (b  mm - r —  mm  = - —  mm  =  — 36.11  mm 


9 


9 


4  = 


Distance  from  L2  to  final  image : 

1  1  V1  (  1  1  V1  650 

h  yj  -+^--+20-97mm 


Transverse  magnification  of  second  image : 

(f )  _  ,  18 


(Mt)2  = 


(-+ )  31 


The  transverse  magnification  of  the  final  image  is  the  product  of  the  transverse  mag¬ 
nifications  of  the  images  created  by  the  two  lenses: 


Mt  -  (Mt)1  ■  (Mt)2  -  (^+3i)  “ 


:il 


=  -0.065 


The  transverse  magnification  indicates  that  the  image  is  minified  (or  demagnified) 
and  inverted. 


Brute-Force  Calculation  for  Object  at  H 

Now  repeat  the  calculation  for  an  object  is  located  at  H.  In  this  case,  the  object  is 
located  “inside”  the  system,  so  the  distance  from  the  object  to  the  first  lens  is: 

s  =  OH  =  -VH  =  -100  mm 


The  object  in  this  case  is  virtual.  The  “intermediate”  image  created  by  the  first,  lens 
Li  is  located  at: 


1  1 

s'i  =  I  7 - 

J  i 


+50  mm 


Note  that  the  transverse  magnification  of  the  image  created  by  the  first  lens  is: 

sj  +50  mm  1 

Si  —100  mm  ”*~2 
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The  object  distance  to  the  second  lens  L2  is: 

s2  =  t  —  s[  =  75  mm  —50  mm  =  +25  mm 


so  the  object  for  the  second  lens  is  real.  The  distance  to  the  final  image  is: 


4  =  v'O' 


i 

Ji 


—50nnn 


The  image  is  located  50  mm  “in  front”  of  L2  (again,  “inside”  the  system)  and  thus 
the  image  of  the  object  at  H  is  virtual  The  diagram  shows  that  the  image  coincides 
with  the  image-space  principal  point  H',  which  is  conveniently  appropriate  for  our 
labeling  convention.  The  transverse  magnification  of  the  image  created  by  the  second 
lens  is: 


(Mt)2 


—50  mm 
+25  mm 


+2 


The  transverse  magnification  of  the  final  image  relative  to  the  original  image  is  the 
product  of  the  two  individual  magnifications: 


Mj*  —  (Tfj1)^  • 


(+£h+2>=+i 


In  words, 


An  object  located  at  H  creates  an  image  at  H;  with  transverse  magnification  +1. 


Ray  trace  for  object  located  at  object-space  principal  point  H  showing  that  the  image 
is  located  at  the  image-space  principal  point  H'. 
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Yet  Another  Example... 


What  if  the  object  located  so  that  OV  =  mm.  We  can  locate  the  image  step  by 
step: 


100  mm  1^  mm 

'  3  ' 


=  —50  mm  (virtual) 


Transverse  magnification  of  first  image:: 


(Mt)  i 


-50  mm) 


100 


mm 


+ 


3 

2 


Distance  from  first  image  to  L2  : 

6'2  =  t  —  s\  =  75  mm  —  (—50  mm)  =  +125  mm 

Distance  from  L2  to  final  image: 

(1  1  V1  /  1  1  A"1  250 

\f2  s-2 )  +50  mm  125  mm  J  3 


Transverse  magnification  of  second  image: 

(++P  mm)  2 

(+125  mm)  3 

The  transverse  magnification  of  the  final  image  is  the  product  of  the  transverse  mag¬ 
nifications  of  the  images  created  by  the  two  lenses: 


Mt  —  (il+j1)^  •  (Mt)2 


=  -1 


Since  Mt  =  —  1,  we  know  that  the  object  is  located  2/  away  and  so  is  the  image. 
This  confirms  the  locations  of  the  principal  points. 


Imaging  Equation  for  Object-  and  Image-Space  Principal  Points 

We  have  just  seen  that  the  object-  and  image-space  principal  points  are  the  points 
related  by  unit  magnification.  They  also  are  the  “reference”  locations  from  which  the 
system  focal  length  is  measured; 


feff  =  FH  =  H'F' 
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(assuming  that  the  object  space  and  image  space  are  the  same  medium,  e.g.,  air).  In 
exactly  the  same  way,  these  are  the  “reference”  locations  from  which  the  object  and 
image  distances  are  measured  for  a  multi-element  system 

s  =  OH 
s'  =  ITO7 

The  ray  entering  the  system  can  be  modeled  as  traveling  from  the  object  O  to  the 
object-space  principal  point  H.  The  resulting  outgoing  (image)  ray  travels  from  the 
image-space  principal  point  H'  to  the  image  point  O'.  This  may  seem  a  little  “weird”, 
but  actually  makes  perfect  sense  if  we  relate  the  measurements  to  the  equation  for 
a  single  thin  lens.  In  that-  situation,  focal  lengths  are  measured  from  the  thin  lens 
to  its  focal  points.  In  other  words,  the  object-  and  image-space  vertices  V  and  V' 
coincide  with  the  principal  points  H  and  H'.  We  know  that-  an  object  located  at 
the  lens  (s  =  0)  generates  an  image  at  the  lens  ( s'  =  0)  with  magnification  of  +1; 
the  heights  of  the  object  and  image  at  the  principal  points  are  identical.  In  the 
realistic  system  where  the  object-  and  image-space  principal  points  are  at-  different 
locations,  the  image  of  an  object  located  at-  H  has  an  image  at-  H'  and  still  with  unit 
magnification;  an  object  located  at  the  object-space  principal  point  creates  an  image 
at  the  image-space  principal  point  with  the  transverse  magnification  Mr  =  +1. 

EMPHASIS:  the  principal  points  H  and  H'  are  the  locations  of  the 
object  and  image  related  by  unit  transverse  magnification:  Mr  =  +1 

Contrast  this  t-o  the  situation  if  the  object  distance  from  H  is  2/,  so  that  the  image 
distance  is  also  2  feff  and  the  transverse  magnficat-ion  is  —1: 


1 

s 


OH  —  s  —  2  fe/f 
1  _  1 

S'  feff 

s'  =  ITO7  =  2feff 
Mt  =  -^H  =  -  1 


The  principal  points  are  “crossed”  in  the  imaging  system  considered  thus  far, 
which  merely  means  that  the  object-space  principal  point  is  “behind”  the  image- 
space  principal  point  (towards  the  image-space  side  of  the  lens  system).  Any  ray  cast 
into  the  system  from  the  object  point  O  t-o  H  creates  an  image  ray  that-  departs  from 
H'  at-  the  same  height-  ( Mr  =  +1)  and  directed  towards  the  image  point  O': 
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Principal  points  of  an  imaging  system:  The  dashed  ray  from  the  object  at  O  reaches 
the  object-space  principal  point  H  with  height  h.  The  image  ray  (solid  line)  departs 
from  the  image-space  principal  point  H'  with  the  same  height  h  and  goes  to  the 
image  point  O',  so  that  the  distances  OH  =  s  and  H'O'  =  s'  satisfy  the  imaging 

equation  7  +  7  =  yjj- 


Location  of  Image  from  System  via  Principal  Points 

We  can  also  solve  this  problem  by  using  the  equivalent  single  thin  lens  where  the 
distances  are  measured  from  the  object-  and  image-space  principal  points.  We  have 
already  shown  that  the  system  (effective)  focal  length  is: 

200 

- mm 

3 


+ 

o 


VH = 1 00mm 


s  =  OH  =  OV  +  100mm 


T 

H 


* 


—  H 'Y'=50mm  + - VO^Il'f 

—  s'  =  H'V' +  V'O'  =  50mm  +  W  — ► 

The  object  and  image  distances  s  and  s'  of  the  single  lens  equivalent  to  the  two-lens 
system  are  respectively  measured  principal  points:  s  =  OH  and  s'  =  H'O'. 


The  object  distance  is  measured  to  the  object-space  principal  point,  which  is  100  mm 
behind  Li  (or  V),  thus  the  object  distance  is  the  distance  from  O  to  Li  plus  100 mm: 


s  =  OV  +  VH  =  1000  mm +100  mm  =  1100  mm 
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The  single- lens  imaging  equation  may  be  used  to  find  the  image  distance  s',  which 
now  is  MEASURED  FROM  THE  IMAGE-SPACE  PRINCIPAL  POINT  H': 


s  = 


1' 


m 
2200  _ 
nr  ~ 


1100 


HO' 


The  image  distance  from  the  vertex  is  calculated  by  subtracting  the  distance  from 
the  image-space  principal  point  H'  to  the  image-space  vertex  V': 


V'O'  =  H  O'  -  H'V' 
2200 


650 


mm  —50  mm  =  =  +20.97  mm 

31  31 


The  resulting  magnification  is: 


Mt  = - — 

s 


(+  mm) 

1100  mm 


—  2*  -0.065 
31 


Note  that  both  the  image  distance  and  the  transverse  magnification  match  those 
obtained  with  the  step-by-step  calculation  performed  above. 


10.5.6  Cardinal  Points 


The  object-space  and  image-space  focal  and  principal  points  are  four  of  the  six  so- 
called  cardinal  points  that  completely  determine  the  paraxial  properties  of  an  imaging 
system.  There  are  three  pairs  of  locations  where  one  of  each  pair  is  in  object  space 
and  the  other  is  in  image  space.  The  object-  and  image-space  focal  points  are  F  and 
F',  while  the  principal  points  H  and  H'  are  the  locations  on  the  axis  in  object  and 
image  space  that  are  images  of  each  other  with  unit  magnification.  The  nodal  points 
N  and  N'  are  the  points  in  object  and  image  space  where  the  ray  angle  entering  the 
object-space  nodal  point  and  exiting  the  image-space  nodal  point  are  identical.  For 
systems  where  the  object  and  image  spaces  are  in  air  (most  of  the  systems  we  care 
about),  the  principal  and  nodal  points  coincide. 

A  table  of  significant  points  on  the  axis  of  a  paraxial  system  is  given  below: 
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Axial  Point 

Object  Space 

Image  Space 

Conjugate?  (self-images?) 

Focal  Points 

F 

F' 

No 

Nodal  Points 

N 

N' 

Yes 

Principal  Points 

H 

FT 

Yes  (Mr  =  +1) 

Vertices 

V 

V' 

No 

Object/Image 

O 

O' 

Yes  (MT-  H^j  -  s's) 

Entrance/Exit  Pupils 

E 

E' 

Yes 

“Equal  Conjugates” 

Sl=OH=  2/ 

4= H'Q'=  2/ 

Yes  ( Mt  =  -1) 

10.5.7  Positive  Thin  Lenses  Separated  by  t  =  f\  +  f'2 

We’ve  already  looked  at  this  example,  but  consider  it  one  more  time.  If  the  two  lenses 
are  separated  by  the  sum  of  the  focal  lengths,  then  an  object  at  00  forms  an  image 
at  00;  the  system  focal  length  is  infinite.  Since  the  focal  points  are  both  located  at 
infinity,  we  say  that  the  system  is  afocal ;  it  has  zero  power,  i.e.,  the  rays  exit  the 
system  at  the  same  angle  that  they  entered  it.  If  the  focal  length  of  the  first  lens  is 
longer  than  that  of  the  second,  the  system  is  a  telescope. 


Two  thin  lenses  separated  by  the  sum  of  their  focal  lengths.  An  object  located  an 
infinite  distance  from  the  first  lens  forms  an  “intermediate”  image  at  the 
image-space  focal  point  f[  of  the  first  lens.  The  second  lens  forms  an  image  at 
infinity.  Both  object-  and  image-space  focal  lengths  of  the  equivalent  system  are 
infinite:  f  =  f  =  00.  The  system  has  “no”  focal  points  -  it  is  afocal. 

The  focal  length  of  this  system  is  : 

1  1  _  _  /  1  1  \  t 

feff  00  V/i  +  h)  hh 

^  (L  4.  ±\  _  ll±A 
\/i  h )  hh 

=  (m£)“GI+£)=0 

where  t,  =  f\  +  f'2  is  the  separation  between  the  two  lenses. 
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10.5.8  Positive  Thin  Lenses  Separated  by  t  =  /i  or  t  =  fa 

We  now  continue  the  sequence  of  examples  for  two  positive  lenses  separated  by  dif¬ 
ferent  distances.  If  two  positive  lenses  are  separated  by  the  focal  length  of  the  first 
lens,  then  the  focal  length  of  the  system  is: 


feff  ( for  t  =  /i) 


/1/2 

(fl  +  /2)  —  fl 


/1/2 

h 


=  fl 


If  separated  by  the  focal  length  of  the  second  lens,  the  system  focal  length  is  f2. 


feff  ( for  t  =  f2) 


/1/2 

(fl  +  ff)  —  Z*2 


/1/2 

fl 


=  h 


For  the  purpose  of  this  example,  we  analyze  the  second  case  because  it  is  the  basis 
for  probably  the  most  common  application  of  imaging  optics.  The  extension  to  the 
first  case  is  trivial.  Since  the  focal  length  of  the  system  is  identical  to  the  focal  length 
of  the  second  lens  if  the  lenses  are  separated  by  t  =  f2,  this  suggests  the  question  of 
the  effect  of  the  first  lens  on  the  image. 


Effect  of  adding  lens  L 1  at  the  object-space  focal  point  of  lens  L2,  so  that  t  =  f2  and 
feff  =  h-  The  upper  sketch  is  the  lens  L2  alone,  and  the  lower  drawing  shows  the 
situation  with  L1  added.  The  image  in  the  second  case  is  formed  “ closer  ’  to  L2. 

Consider  a  specific  case  with  f2  =  100  mm  and  f\  =  200  mm.  If  only  L2  is  present 
and  the  object  distance  is  s2  =  1100  mm,  the  image  distance  is: 


110  mm 


The  associated  transverse  magnification  is: 

„ ,  si,  110  mm  1 

Mt  =  — -  = - = - 

s2  1100  mm  10 
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Now  add  L\  at  the  front  focal  point  of  L2  and  find  the  associated  image.  The  object 
distance  to  L\  is  1100  mm  —100  mm  =  1000  mm.  The  lens  forms  an  image  at  distance: 

si  =(!--—')  =  ( - - - - - ^  =  +250  mm 

\fi  Si  J  y  200  mm  1000  mm  J 

The  associated  transverse  magnification  is: 


(Mt)  1 


250  mm  1 

1000  mm  4 


The  object  distance  to  the  second  lens  is: 


S2  =  t  —  s[  =  100  mm  —250  mm  =  —150  mm 


and  the  resulting  image  distance  is: 


+60  mm 


The  associated  transverse  magnification  of  the  image  formed  by  the  second  lens  is: 


(Mt)2  - 


60 

-150 


+ 


2 

5 


The  magnification  of  the  system  is  the  product  of  the  magnifications: 


Mt  —  (yMj^)  1 


1 

10 


This  is  the  same  transverse  magnification  that  we  obtained  from  L2  alone!  The 
magnification  is  not  changed  by  the  addition  of  lens  L  \ !  However,  the  position  of  the 
image  HAS  changed  (from  +110 mm  to  +60  mm);  the  image  is  closer  to  L2  if  L\  is 
added. 

This  system  demonstrates  the  principle  of  eyeglass  lenses,  where  the  corrective 
lens  is  placed  at  the  object-space  focal  point  of  the  eyelens.  The  corrective  action  is 
to  move  the  image  without  changing  its  transverse  magnification. 


10.5.9  Positive  Thin  Lenses  Separated  by  t  >  f\  +  f'2 


If  the  two  positive  lenses  are  separated  by  more  than  the  sum  of  the  focal  lengths, 
the  focal  length  of  the  resulting  system  is  negative: 


feff 


hh 

(A  +  h)^t 


<  0 


If  the  object  distance  is  00,  the  first  lens  forms  an  “intermediate”  image  at  its  image- 
space  focal  point,  i.e. ,  at  s[  =  f[.  Since  the  object  distance  S2  measured  from  the 
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second  lens  is  larger  than  f2,  a  “real”  image  is  formed  by  the  second  lens  at  the  system 
focal  point  F' .  If  we  extend  the  exiting  ray  until  it  intersects  the  incoming  ray  from 
the  object  at  infinity,  we  can  locate  the  equivalent  single  thin  lens  for  the  system, 
i.e. ,  the  image-space  principal  point  H' .  In  this  case,  this  is  located  farther  from  the 
second  lens  than  the  focal  point.  The  effective  focal  length  feff  =  H'F'  <  0,  so  the 
system  has  negative  power;  this  system  created  from  two  positive  lenses  is  equivalent 
to  a  single  thin  lens  with  negative  power. 


The  system  composed  of  two  thin  lenses  separated  by  d  >  f\  +  f2  ■  The  image-space 
focal  point  F'  of  the  system  is  beyond  the  second  lens,  but  the  image-space  principal 
point  H7  is  located  even  farther  from  L2.  The  distance  H'F7  =  feff  <  0,  so  the 

system  has  negative  power! 


10.5.10  Systems  of  Two  Positive  Thin  Lenses  with  Different 
Focal  Lengths 


feff 

BFD 


H'V7 


FFD 


VH 


H'F'  =  FH 


/1/2 


100  mm  -25  mm 


V'F7  = 


(A  +  ff)  ~  t  100  mm  +25  mm  —t 
f2  (fi  —  t )  25  mm  ■  (100  mm  —t) 


H'F'  -  V'F'  = 


(/1  F  f 2)  ~t  (100  mm  +25  mm)  —  t 

/1/2  f2(fi—t) 


fit 


FV  = 


(/1  +  fi)  ~t  (/1  +  fi)  ~  t  (/1  +  f 2)  —  t 
fi  if 2  —  t )  100  mm  •  (25  mm  — t ) 


FH  -  FV  = 


(/1  +  f2)  —  t  (100  mm  +25  mm)  —  t 
fit 


(fi  +  f 2)- t 


fi  =  +100  mm 
f2  =  +25  mm 
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t 

FFD 

BFD 

feff 

0  mm 

+20  mm 

+20  mm 

+20  mm 

+25  mm 

0  mm 

+18.75  nun 

+25  mm  =  f2 

+50  mm 

— 33  A  mm 

+16|  mm 

+33 1  mm 

+75  mm 

—  100  nun 

+12.5  mm 

+50  mm 

+100  mm 

—300  nun 

0  mm 

+100  mm  =  f1 

+125  mm  =  f±+f2 

oo 

oo 

oo 

+150  mm 

+500  mm 

+50  mm 

—100  mm 

+175  mm 

+300  mm 

+37.5  mm 

—50  mm 

+200  mm 

+233 1  mm 

+33 1  mm 

— 33 1  mm 

+225  mm 

+200  mm 

+31.25  mm 

—25  mm 

+250  mm 

+180  mm 

+30  mm 

—20  mm 

10.5.11  Newtonian  Form  of  Imaging  Equation 

The  familiar  Gaussian  form  of  the  imaging  equation  is: 

1  1  _  1 

s  s'  J 

An  equivalent  form  is  obtained  by  defining  the  distances  x  and  x1  measured  from  the 
focal  points: 


s  =  x  +  /  =>•  x  =  s  —  f 
s'  =  x'  +  f  =$■  x'  =  s'  —  / 

In  the  case  of  the  real  image  O'  for  the  real  object  O  shown  in  the  figure,  both  x  and 
x'  are  positive  because  the  distances  are  measured  from  left  to  right: 


A 


M 

( 

X 

0  i 

v  f 

U 

'  7 

x' 

7'  C 

r 

The  definition  of  the  parameters  x,x'  in  the  Newtonian  form  of  the  imaging 
equation.  For  a  real  image,  both  x  and  x'  are  positive. 
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By  simple  substitution  into  the  imaging  equation,  we  obtain: 


1  _  1  1 

7  x  +  f  +  x'  +  f 

1  =  (x'  +  f)  +  (x  +  f) 
f  (x  +  f)  0'  +  /) 

xx'  +  (x  +  x')  f  +  f 2 
(x  +  x')  +  2/ 

=>-  \(x  +  x')  +  2f\  ■  f 

=>•  x  ■  x'  =  P 


xx'  +  (x  +  x')  f  +  f2 


This  is  the  Newtonian  form  of  the  imaging  equation.  The  same  expression  applies 
for  virtual  images,  but  the  sign  of  the  distances  must  be  adjusted,  as  shown: 


O' 


r 


F 


f 


■yr 


o 


x  <  0 


A 


f 


F' 


x’<  0 


The  parameters  x,  x'  of  the  Newtonian  form  for  a  virtual  image. 


10.5.12  Another  System  of  Two  Thin  Lenses:  the  Telephoto 
Lens 

Now  consider  a  system  composed  of  a  positive  lens  and  a  negative  lens  separated  by 
slightly  more  than  the  sum  of  the  focal  lengths.  For  example,  consider  f\  =  +100  mm, 
/2  =  —25  mm,  and  t  =  +80  mm  ~  /1  +  /2  =  75  mm.  The  focal  length  of  the  equivalent 
thin  lens  is  easy  to  calculate: 

1  _  1  1  t 

feff  fl  h  /l/2 

11  80  mm 

100  mm  —25  mm  (+100  mm)  (—25  mm) 
feff  =  500  mm  »  fi 
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Now  locate  the  image-space  focal  point  and  principal  point.  For  an  object  located  at 
oo,  the  BFD  is  found  by  substitution  into  the  appropriate  equation: 


BFD  =  V'F' 


h  (t  -  a) 

t  -  </i  +  f2) 


(—25  mm)  (80  mm  —100  mm) 
80  mm  —  (100  mm  +  (—25  mm)) 


+100  mm 


The  image  of  an  object  at  oo  is  located  100  mm  behind  the  second  lens,  and  thus 
180nun  behind  the  first  lens.  The  physical  length  of  the  system  when  imaging  an 
object  at  oo  is  180  nun,  which  is  MUCH  less  than  the  equivalent  focal  length  /',,//■  = 
500  nun.  This  is  an  example  of  an  optical  system  whose  physical  length  is  much 
shorter  than  the  focal  length.  Such  a  lens  is  useful  for  photographers;  it  is  a  “short” 
(and  portable)  lens  system  that  has  a  “long”  focal  length.  Note  that  astronomers 
also  find  such  systems  to  be  useful. 

The  locations  of  the  image-space  principal  point  is  determined  from  the  focal 
distance  and  the  equivalent  focal  length: 


H'F'  =  H'V'  +  V'F' 

500  mm  =  H'V'  +  100  mm 
H'V'  =  +400  mm 

H'V  =  H'V'  —  VV'  =  400  nun  —80  mm  =  +320  mm 


so  the  principal  point  is  located  320  nun  in  front  of  the  object-space  vertex  V.  A 
sketch  of  the  system  and  the  image-space  cardinal  points  is  shown: 


/)=+  100mm  f2=-25mm 


(not  to  scale) 

Image-space  focal  and  principal  points  of  the  telephoto  system.  The  equivalent  focal 
length  of  the  system  is  feff  =  +500  mm,  but  the  image-space  focal  point  is  only 
+100  mm  behind  the  rear  vertex  V' .  Tthe  image-space  principal  point  is  500  mm  in 

front  of  the  focal  point. 

The  object-space  focal  point  is  located  by  applying  the  expression  for  the  “front 
focal  distance”: 


FFD  =  FV  =  —  f2}  . 

t  ~  (fi  +  h) 


(+100 mm)  (80mm—  (—25  nun)) 
80  mm  —  (100  mm  +  (—25  mm)) 


+2100  mm 
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which  is  far  in  front  of  the  object-space  vertex  V.  The  object-space  principal  point 
is  found  from: 


FH  =  FV  +  VH 

+500  mm  =  +2100mm+VH 

VH  =  500  mm— 2100  mm  = 

So  the  object-space  principal  point  is 


—1600  mm  =>•  HV  =  —  VH  =  +1600  mm 
very  far  in  front  of  the  first  vertex. 


/i=+100nmi  1  +80mm  y^=-25nmi 


FFD =+21  OOmin 


(not  to  scale) 

Object-space  focal  and  principal  points  of  the  telephoto  system.  Both  are  located  way 

out  in  front  of  the  front  vertex. 


Locating  the  Image  from  the  Telephoto  lens 

We  can  locate  the  image  of  an  object  at  a  finite  distance  (say,  OV  =  3000  mm)  using 
any  of  the  three  methods:  “brute-force”  calculation,  by  applying  the  Gaussian  imag¬ 
ing  formula  for  distances  measured  from  the  principal  points,  and  form  the  Newtonian 
imaging  equation. 


Gaussian  Formula  “step  by  step”  The  distance  from  the  object  to  the  first  thin 
lens  is  3000  mm,  so  the  intermediate  image  distance  satisfies: 


1  1 

- 1  7 

s  1  •+ 


1 

h 

( — - — 
y  100  mm 


1 


3000  mm 


3000 

- mm  =  103.45  mm 

29 


The  transverse  magnification  of  the  image  from  the  first  lens  is: 


(Mr)i 


1 

29 


The  object  distance  to  the  second  lens  is  negative: 


3000 

mm 


680 


mm  =  —23.45  mm 


S2  =  t  —  s[  =  80  mm 


29 


29 
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the  object  is  virtual.  The  image  distance  from  the  second  lens  is: 


1  1 
- 1  T 

s2  s'. 


1 

h 


Si  = 


1 


25  mm 


29 


680  mm 


1  3400 

=  -\ - mm  =  +877.8  mm 


The  corresponding  transverse  magnification  is: 


4 


(Mt)2  =  = 

s  2 


+  44  mm 


680 
'  29 


=  -16.1 


mm 


The  system  magnification  is  the  product  of  the  component  transverse  magnifications: 


Mt  -  (MT)l  •  (Mt)2  -  •  -16.1  -  -- 


Gaussian  Formula  from  Principal  Points  Now  evaluate  the  same  image  using 
the  Gaussian  formula  for  distances  measured  from  the  principal  points.  The  distance 
from  the  object  to  the  object-space  principal  point  is: 

=  OH  =  OV  +  VH  =  3000  mm  +  (—1600  mm)  =  +1400  mm 


The  image  distance  measured  from  the  image-space  principal  point  is  found  from  the 
Gaussian  image  formula: 


1 

s' 


1 

fef  f 


1 

s 


HO' 


— r - - — 1 

500  mm  1400  mm  J 


7000 


mm  =  777.8  mm 


The  distance  from  the  rear  vertex  to  the  image  is  found  from  the  known  value  for 
H'V'  =  +400  mm: 


V  O'  =  H  O'  -  H'V' 

7000 


=  + 


9 


mm  —400  mm  = 


3400 

~9~ 


mm  =  377.8  mm 


thus  matching  the  distance  obtained  using  “brute  force” .  The  transverse  magnfication 
of  the  image  created  by  the  system  is: 


Mt  = - = 

s 


+  7220  mm 


+1400  mm 


5 

9 


Newtonian  Formula  Now  repeat  the  calculation  for  the  image  position  using  the 
Newtonian  lens  formula.  The  distance  from  the  object  to  the  object-space  focal  point 
is: 


x  =  OF  =  OV  +  VF  =  OV  —  FV  =  3000  mm  —2100  mm  =  900  mm 
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Therefore  the  distance  from  the  image-space  focal  point  to  the  image  is: 

,  — —  Peff  (500mm)2  2500  0 

x  =  F'O'  =  — —  = - = - mm  =  277.8  mm 

x  900  mm  9 

So  the  distance  from  the  rear  (image-space)  vertex  V'  to  the  image  is: 


VO'  =  V'F'  +  F'O' 

2500  3400 

=  100  mm  -| - mm  =  - mm  =  377.8  mm 

9  9 

which  again  agrees  with  the  result  obtained  by  the  other  two  methods. 


10.6  Stops  and  Pupils 


In  any  multielement  optical  system,  the  beam  of  light  that  passes  through  the  system 
is  shaped  like  a  circular  “spindle”  with  different  radii  at  different  axial  locations.  The 
diameter  of  a  specific  element  limits  the  size  of  this  spindle  of  rays  that  enters  or 
exits  the  system.  This  element  is  the  stop  of  the  system  and  may  be  a  lens  or  an 
aperture  with  no  power  (i.e.,  an  iris  diaphragm)  that  is  placed  specifically  to  limit,  the 
ray  cone.  Obviously,  an  imaging  system  composed  of  a  single  lens  is  also  the  stop  of 
the  system.  In  a  t.wo-element  system,  the  stop  must  be  one  of  the  two  lenses;  which 
lens  is  determined  by  the  relative  sizes.  The  image  of  the  stop  seen  from  the  input 
“side”  of  the  lens  is  the  entrance  pupil ,  which  determines  the  extent  of  the  ray  cone 
from  the  object  that  “gets  into”  the  optical  system,  and  thus  the  “brightness”  of  the 
image.  The  image  of  the  stop  seen  from  the  output  “side”  is  the  exit  pupil. 


The  locations  and  sizes  of  the  pupils  are  determined  by  applying  the  ray-optics 
imaging  equation  to  these  objects.  To  some,  the  concept  of  finding  the  image  of  a 
lens  may  seem  confusing,  but  it  is  no  different  from  before  -  just  think  of  the  lens  as 
a  regular  opaque  object. 
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A  three-lens  imaging  system  where  the  second  lens  is  the  stop,  its  image  seen 
through  the  first  lens  is  the  entrance  pupil,  and  its  image  seen  through  the  last  lens 

is  the  exit  pupil. 

Consider  the  stops  and  pupils  of  the  Galilean  telescope.  Which  element  is  the  stop 
depends  on  the  relative  sizes  of  the  lenses.  In  the  first  case  shown  below,  the  first  lens 
(the  objective)  is  small  enough  that  it  acts  as  the  stop  (and  thus  also  the  entrance 
pupil).  The  image  of  the  objective  lens  seen  through  the  eyelens  is  the  exit  pupil, 
and  is  “between”  the  two  lenses  and  very  small.  Because  the  exit  pupil  is  small,  so  is 
the  field  of  view  of  the  Galilean  telescope.  In  the  second  example,  the  smaller  eyelens 
is  the  stop  and  also  the  exit  pupil,  while  the  image  of  the  eyelens  seen  through  the 
objective  is  the  entrance  pupil  and  is  far  behind  the  eyelens  and  relatively  large. 

10.6.1  Stop  and  Pupils  of  Galilean  and  Keplerian  Telescopes 

Consider  the  two  two-lens  telescope  designs;  the  Galilean  telescope  has  a  positive- 
power  objective  and  a  negative-power  ocular  or  eyelens,  while  the  Keplerian  telescope 
has  a  positive  objective  and  a  positive  eyelens.  Assume  that  the  objective  is  identical 
in  the  two  cases  with  f\  =  +100  mm  and  d\  =  30  mm.  The  focal  lengths  of  the  oculars 
are  /  =  ±15  mm  and  d  =  +15  mm  (these  are  the  approximate  dimensions  and  focal 
lengths  of  the  lenses  in  the  OS  A  Optics  Discovery  Kit).  The  lenses  of  a  telescope 
are  separated  by  f\  +  f 2,  or  85  mm  for  the  Galilean  and  115  mm  for  the  Keplerian. 
We  want  to  locate  the  stops  and  pupils.  The  stop  is  found  by  tracing  a  ray  from  an 
object  at  00  through  the  edge  of  the  first  element  and  finding  the  ray  height  at  the 
second  lens.  If  this  ray  height  is  small  enough  to  pass  through  the  second  lens,  then 
the  first  lens  is  the  stop;  if  not,  then  the  second  lens  is  the  stop. 

Consider  the  Galilean  telescope  first.  The  ray  height  at  the  first  lens  is  the  “semi¬ 
diameter”  of  the  lens:  y  =  15  mm.  From  there,  the  ray  height  would  decrease  to 
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0  mm  at  a  distance  of  /i  =  +100  mm,  but  it  encounters  the  negative  lens  at  a  distance 
t  =  +85  mm.  The  ray  height  at  this  lens  is  100™™m^mm  •  15  mm  =  2.25  mm,  which  is 
much  smaller  than  the  lens  semidiameter  of  y  =  7.5  mm.  Since  the  ray  “bundle”  is 
constrained  by  the  diameter  of  the  first  lens,  it  is  the  aperture  stop  of  the  system. 


The  entrance  pupil  is  the  image  of  the  stop  as  seen  through  all  of  the  elements 
that  come  before  the  stop.  In  this  example,  the  first  lens  is  also  the  entrance  pupil, 
so  the  transverse  magnification  of  the  entrance  pupil  is  unity. 


The  exit  pupil  is  the  image  of  the  stop  through  all  elements  that  come  afterwards, 
i.e.,  just  the  negative  lens.  The  distance  to  the  “object”  is  A  +  /2  =  85  mm,  so  the 
imaging  equation  is  used  to  locate  the  exit  pupil  and  determine  its  magnification: 

1  1 

85  mm  s' 


Mt 

The  exit  pupil  is  upright,  but  more  important,  it  is  virtual  and  thus  is  not  accessible 
to  the  eye  (you  can’t  put  your  eye  at  the  exit  pupil  of  a  Galilean  telescope). 


1 


1 


/2  —15  nun 

1  1 


-l 
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15  mm  85  nun 
s'  —12.75  mm 

s  85  mm 


■  mm  =  —12.75  mm 


=  0.15 


Follow  the  same  procedure  to  determine  the  stop  and  locate  the  pupils  and  their 
magnifications  for  the  Keplerian  telescope.  The  ray  height  at  the  first  lens  for  an 
object  located  at  oo  is  again  15mm.  The  ray  height  decreases  to  Omni  at  the  focal 
point,  but  then  decreases  still  farther  until  encountering  the  ocular  lens  at  a  distance 
of  /1  +  /2  =  115  mm.  The  ray  height  h  at  this  lens  is  determined  from  similar  triangles: 


15  mm 
-h 


100  mm 
15  mm 


h 


—2.25  mm 


So  the  first  lens  is  the  stop  and  entrance  pupil  (with  unit  magnification)  in  this  case 
too.  The  distance  from  the  stop  to  the  second  lens  is  /1  +  /2  =  115  mm,  so  the  imaging 
equation  for  locating  the  exit  pupil  and  determining  its  magnification  are: 


1  1 
115  nun  s' 


Mf 


1 


1 


+15  nun 
1 


15  mm  115  mm 
+17.25  mm 


69 

=  H - mm  =  +17.25  mm 

4 


85  mm 


^  -0.203 


The  exit  pupil  of  a  Keplerian  telescope  is  a  real  image  -  we  can  place  our  eye  at  it. 
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Galilean  telescope  for  object  at  si  =  +oo:  (a)  The  objective  lens  is  the  stop  because 
it  limits  the  cone  of  entering  rays  (it  also  is  the  entrance  pupil).  The  image  of  the 
stop  seen  through  the  eyelens  is  the  exit  pupil,  and  is  very  small;  (b)  The  eyelens  is 
the  stop  and  the  exit  pupil.  The  image  of  the  eyelens  seen  through  the  objective  is 
the  entrance  pupil,  and  is  behind  the  eyelens  because  the  object  distance  to  the 

objective  is  less  than  a  focal  length. 


10.6.2  System  /-Number 

The  “brightness”  of  a  recorded  image  is  determined  by  the  ability  of  the  lens  to  gather 
light  and  by  the  area  of  the  image.  This  section  considers  the  “light-gathering  power” 
of  an  optical  system  that  creates  a  real  image,  i.e. ,  one  that  can  be  placed  on  a  sensor 
(sheet  of  film  or  CCD).  To  create  a  real  image,  the  object  distance  s  must  be  larger 
than  the  focal  length  feff-  The  transverse  magnification  of  the  system: 

Mt  _  Jfi  =  _!j_  _  (/IT/  ~  «)  _  feff  =  feff  f  1  \ 

h  s  S  feff  -s  S  yi  -  llLLj 

Since  s  >  f  to  ensure  that  the  image  is  real,  then  y  <  1  and  the  second  term  can  be 
expanded  using  the  well-known  series: 

OO 

=  tn  =  1  +  t  +  t2  +  •  •  •  if  |t|  <  1 

n= 0 


1  -t 
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Hence 


feff  ,  feff 

o  o  ' 


Note  that  MT  <0.  If  s  »  /e//  (as  often  is  the  case;  the  object  is  quite  far  from  the 
system),  then  we  can  truncate  the  series  after  the  first  term  and  we  can  say  that: 


Mt  =  oc  feff  if  feff  «  S 

In  words,  the  transverse  magnification  is  approximately  proportional  to  the  focal 
length  for  real  images  of  objects  located  many  focal  lengths  from  the  system.  This 
refers  to  the  linear  dimension,  so  the  area  of  the  image  is  proportional  to  (Mt)2  oc  f  2. 

The  “brightness”  of  a  recorded  image  also  is  determined  by  the  ability  of  the  lens 
to  gather  light,  which  is  determined  by  the  area  of  the  entrance  pupil.  This  is  (of 
course)  proportional  to  the  square  of  the  diameter  D  of  the  exit  pupil  and  thus  the 
the  square  of  the  diameter  of  the  aperture  stop.  If  we  combine  the  contributions  from 
the  exit  pupil  and  the  transverse  magnification,  we  can  see  that  the  “flux  density”  of 
light  (or  the  “brightness”  of  the  image)  is  proportional  to 


flux  density  oc 


2 


The  ratio  D  /  feff  is  sometimes  called  the  relative  aperture  and  its  reciprocal  is  the 
f-number  or  f-ratio  of  the  system: 


//# 


feff 

D 


For  a  fixed  focal  length,  a  system  with  a  smaller  /-number  collects  more  light  and 
thus  produces  a  brighter  image. 


10.7  Ray  Tracing 

The  imaging  equat.ion(s)  become  quite  complicated  in  systems  with  more  than  a  very 
few  lenses.  However,  we  can  determine  the  effect  of  the  optical  system  by  ray  tracing , 
where  the  action  on  two  (or  more)  rays  is  determined.  Raytracing  may  be  paraxial  or 
exact.  Historically,  graphical,  matrix,  or  worksheet  ray  tracing  were  commonly  used 
in  optical  design,  but  most  ray  tracing  is  now  implemented  in  computer  software  so 
that  exact  solutions  are  more  commonly  implemented  than  heretofore. 

10.7.1  Marginal  and  Chief  Rays 

Many  important  characteristics  of  an  optical  system  are  determined  by  two  specific 
rays  through  the  system.  The  marginal  ray  travels  from  the  optic  axis  at  the  center 
of  the  object,  just  grazes  the  edge  of  the  stop,  and  then  travels  to  the  center  of  the 
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image.  The  chief  (or  principal )  ray  travels  from  the  edge  of  the  object  through  the 
center  of  the  stop  to  the  edge  of  the  image.  An  image  is  created  wherever  the  marginal 
ray  crosses  the  axis.  The  chief  ray  crosses  the  axis  at  the  stop  and  the  pupils. 


Marginal  Ray 
Chief  Ray 


V' 


o 


V 


Stop 


O' 


The  marginal  and  chief  rays  for  a  two-element  imaging  system  where  the  second 
element  is  the  stop.  The  marginal  ray  comes  from  the  center  of  the  object  O,  grazes 
the  edge  of  the  stop  and  through  the  center  of  the  image  O' .  The  chief  ray  travels 
fromt  the  edge  of  the  object  through  the  center  of  the  stop  to  the  edge  of  the  image. 


10.8  Paraxial  Ray  Tracing  Equations 

Consider  the  schematic  of  a  two-element  optical  system: 


n=n]  Hi  Hi'  n\=n2  H2  H2'  n’2=n3 


9i  cp2 

Schematic  of  rays  in  ray  tracing,  using  the  marginal  ray  as  an  example.  The  ray 
height  at  the  nth  element  is  yn  and  the  ray  angle  during  transfer  between  elements 
n  —  1  and  n  is  un.  The  system  has  two  elements,  represented  by  the  pairs  of 

principal  planes. 

The  two  elements  are  represented  by  their  two  principal  “planes”,  which  are  the 
planes  of  unit  magnification.  The  refractive  power  of  the  first  element  changes  the 
ray  angle  of  the  input  input  ray.  In  the  example  shown,  the  input  ray  angle  u\  =  0 
radians,  i.e.,  the  ray  is  parallel  to  the  optical  axis.  The  height  of  this  ray  above 
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the  axis  at  the  object-space  principal  plane  Hi  is  iji  units.  The  ray  emerges  from 
the  principal  plane  Hj  at  the  same  height  ij\  but  with  a  new  ray  angle  u2  •  The  ray 
“transfers”  to  the  second  element  through  the  distance  t2  in  the  index  n2  and  has 
ray  height  1/2  at  principal  plane  H2.  The  ray  emerges  from  the  principal  plane  at  the 
same  height  but  a  new  angle  u3. 


10.8.1  Paraxial  Refraction 

Consider  refraction  of  a  paraxial  ray  emitted  from  the  object  O  at  a  surface  with 
radius  of  curvature  R.  For  a  paraxial  ray,  the  surface  may  be  drawn  as  “vertical”. 
The  height  of  the  ray  at  the  surface  is  y. 


Refraction  of  a  paraxial  ray  at  a  surface  with  radius  of  curvature  R  between  media 
with  refractive  indices  n  and  n' .  The  ray  height  and  angle  at  the  surface  are  y  and 
u,  respectively.  The  angle  of  the  ray  measured  at  the  center  of  curvature  is  a.  The 
height  and  angle  immediately  after  refraction  are  y  and  v! .  The  object  and  image 

distances  are  s  and  s' . 


From  the  drawing,  the  incoming  ray  angle  is: 


u  =  tan" 


y 

is. 


=  -  >  0 


The  corresponding  equation  for  the  outgoing  ray  is: 


u'  =  tan 


y 


1  s' 


=  -  >  0 


and  the  angle  of  the  refraction  measure  from  the  center  of  curvature  is: 


a  =  —  tan 


y 

IR1 


y_ 

R 
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Snell’s  law  tells  us  that: 


n  sin  [z]  =  n'  sin  \i'] 
n  sin  [z]  =  n  sin  [u  —  a]  =  n  [u  —  a] 
n!  sin  \i']  =  n'  sin  [u'  —  a]  =  n!  [v!  —  a] 


n  [u  —  a]  =  n!  [v!  —  a] 

n!u'  =  nu  —  na  +  n'a  =  nu  +  a  (n!  —  n) 


n!v!  =  nu  +  a  (n!  —  n ) 

On!  —  n ) 

=  nu  —  y - — —  =  nu  —  yip 

IX 

The  paraxial  refraction  equation  in  terms  of  the  incident  angle  u.  refracted  angle  v! , 
ray  height  y.  surface  power  ip  =  j,  and  indices  of  refraction  n  and  n'  is: 


10.8.2  Paraxial  Transfer 


The  paraxial  transfer  equation:  the  ray  traverses  the  distance  t  in  the  medium  with 
index  n! .  The  initial  and  final  ray  heights  are  y  and  y' ,  respectively.  The  angle  is 
u!  =  tan-1  y'  =  yJrtv!  =  yJr^j  ( n'u !) 

The  transfer  equation  determines  the  ray  height  if  at  the  next  surface  given  the 
initial  ray  height  y,  the  distance  t.  and  the  angle  u! .  From  the  drawing,  we  have: 

y'  =  y  +  tu' 
y'  =  y+  (^)  (n'u') 
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where  the  substitution  was  made  to  put  the  ray  angle  in  the  same  form  n'v!  that 
appeared  in  the  refraction  equation.  The  distance  d-  <  t  is  called  the  reduced  thick¬ 
ness. 


10.8.3  Linearity  of  the  Refraction  and  Transfer  Equations 

Note  that  both  the  refraction  and  transfer  equations  are  linear  in  the  height  and 
angle,  i.e. ,  neither  includes  any  operations  involving  squares  or  nonlinear  functions 
(such  as  sine,  logarithm,  or  tangent).  Among  other  things,  this  means  that  they  may 
be  scaled  by  direct  multiplication  to  obtain  other  “equivalent”  rays.  For  example, 
the  output  angle  may  be  scaled  by  scaling  the  input  ray  angle  and  the  height  by  a 
constant  factor  a: 


a  (■ nu  —  yp)  =  a  ■  ( nu )  —  (a  ■  y)  ip  =  a  ( n'u ') 

We  will  take  often  advantage  of  this  linear  scaling  property  to  scale  rays  to  to  find 
the  exact  marginal  and  chief  rays  from  the  provisional  counterparts. 


10.8.4  Paraxial  Ray  Tracing 

To  characterize  the  paraxial  properties  of  a  system,  two  provisional  rays  are  traced: 


1.  Initial  ray  height  (at  first  surface)  y  =  1.0,  initial  angle  nu  =  0 

2.  Initial  ray  height  (at  first  surface)  y  =  0.0,  initial  angle  nu  =  1 


We  have  already  named  these  rays;  the  first  is  the  provisional  marginal  ray  that 
intersects  the  optical  axis  at  the  object  (and  thus  also  at  every  image  of  the  object). 
The  second  ray  is  called  the  provisional  chief  (or  principal)  ray  and  travels  from  the 
edge  of  the  object  to  the  edge  of  the  field  of  view  through  the  center  of  the  stop  (and 
thus  through  the  centers  of  the  pupils,  which  are  images  of  the  stop). 

The  process  of  ray  tracing  is  perhaps  best  introduced  by  example.  Consider  a 
two-element  three-surface  system  with  three  surfaces.  The  three  radii  of  curvature 
are  R\  =  +7.8  mm,  i?2  =  +10  mm,  and  R3  =  —6  mm.  The  distance  between  the 
first  two  surfaces  (the  thickness  of  the  first  element)  and  between  the  second  and 
third  surfaces  are  both  3.6  mm.  The  refractive  index  between  the  first  two  surfaces  is 
n2  =  1.336  and  between  the  second  and  third  surfaces  is  7/3  =  1.413.  The  index  after 
the  last  surface  is  77.4  =  n2  =  1.336. 
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n3  =  1.413 


The  first  action  of  the  system  is  paraxial  refraction  at  the  first  surface,  which  changes 
the  ray  angle  but  not  the  ray  height.  The  new  ray  angle  for  the  provisional  marginal 
ray  is: 


( n'u')1  =  (nu) ,  —  y1  [mm]  •  ipi  [mm  11 
=  0  -  (1.0)  (+0.043077) 

=  —0.043077  radian 

(note  the  retention  of  7  decimal  places;  after  cascading  the  large  number  calculations 
for  a  complex  system,  the  precision  of  the  final  results  will  be  significantly  poorer). 
The  paraxial  transfer  equation  for  the  provisional  marginal  ray  between  the  first  and 
second  surface  changes  the  height  of  the  ray  but  not  the  angle.  The  height  at  the 
second  surface  is: 


Vi  =  Vi  +  (n'u')i  [mm] 

3  6 

=  1  +  — —  (-0.043077)  =  +0.883924  mm 
1.336 

Thus  the  ray  exits  the  first  surface  at  the  “reduced  angle”  n'u'  =  —0.04  radians  and 
arrives  at  the  second  surface  at  height  y'  =  +0.88  nun.  The  corresponding  equations 
for  the  chief  ray  at  the  first  surface  are: 


(n'v!) i  =  (nu)1  -  y^i 

=  1  -  (0.0)  (+0.043077) 
=  1  radian 
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y'i=yi+ 

3  6 

=  0  H - (1)  =  +2.694611  mm 

1.336  v  ' 

Since  the  provisional  chief  ray  went  through  the  center  of  the  the  first  surface  (yT  =  0), 
the  ray  angle  nu  did  not  change.  The  height  of  the  chief  ray  at  the  second  surface 
(y[  =  yf)  is  proportional  to  the  initial  ray  angle  yi. 

The  equations  are  evaluated  in  sequence  to  compute  the  rays  through  the  system. 
These  are  presented  in  the  table.  Each  column  in  the  table  represents  a  surface  in 
the  system  and  the  “primed”  quantities  refer  to  distances  and  angles  following  the 
surface.  In  words,  t'  in  the  first,  row  are  the  distances  from  the  surface  in  the  column 
to  the  next  surface. 


R 

t' 

n' 

—  n'-n 

1.0 

+7.8  mm 

3.6  mm 

1.336 

—0.043077  mm-1 

3'6““-  2.694611  mm 

1.33b 

+10.0  mm 

3.6  mm 

1.413 

—0.007700  mm-1 

2.54771  mm 

—6.0  mm 

1.336 

-0.012833  mm 

IMAGE  ^ 

V  ~  R 

£ 
n ' 

12.699  mm 

y 

1 

1  mm 

0.883924  mm 

0.756833  mm 

0  mm 

n'u' 

0 

—0.043077  radian 

—0.049883  radian 

—0.059596  radian 

—0.060  radian 

y 

0  mm 

2.694611  mm 

5.189519  mm 

16.779  mm 

n'u' 

1  radian 

1  radian 

0.979251  radian 

0.913  radian 

The  provisional  marginal  ray  emerges  from  the  last  surface  with  height  and  angle 


specified  by  the  vector 


y 

0.756833  mm 

n'u' 

—0.059596  radians 

so  the  distance  to  the 


location  where  the  marginal  ray  height  is  0  may  be  evaluated  by  tracing  the  exiting 
ray  “forward”  to  the  location  of  the  image. 


y'  =  0  =  7/  +  —  (nV) 

TV 

0  =  (+0.756833)  +  —  (-0.059596) 

77/ 


n' 


+0.756833 

0.059596 


=  12.699  mm 


t'  =  12.699  mm  -n' 


12.699  •  1.336  =  16.966  mm 


The  height  and  angle  of  the  provisional  chief  ray  at  the  image  location  are  y 
16.78  mm  and  n'u!  =  0.91  radians,  respectively. 


r^j 
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This  system  is  often  used  as  a  model  for  the  human  eye  if  the  lens  is  relaxed 
to  view  objects  at  oo.  The  first  surface  is  the  cornea  of  the  eye,  while  the  other 
two  surfaces  are  the  front  and  back  of  the  lens.  Note  that  the  power  of  the  cornea 
(0.043077  mm-1  =  43  diopters)  is  considerably  larger  than  the  powers  of  the  lens 
surfaces  (7.7  and  12.8  diopters,  respectively);  in  other  words,  most  of  the  refraction 
of  the  eye  system  occurs  at  the  cornea. 


10.8.5  Matrix  Formulation  of  Paraxial  Ray  Tracing 

The  same  linear  paraxial  ray  tracing  equations  may  be  conveniently  implemented  as 
matrices  acting  on  ray  vectors  for  the  marginal  and  chief  rays  whose  components  are 
the  height  and  angle.  The  ray  vectors  may  be  defined  as: 


- 

” 

y 

y 

nu 

nu 

Note  that  there  is  nothing  magical  about  the  convention  for  the  ordering  of  y  and 
nu ;  this  is  the  convention  used  by  Roland  Shack  at  the  Optical  Sciences  Center  at 
the  University  of  Arizona,  but  Willem  Brouwer  wrote  a  book  on  matrix  methods  in 
optics  that  uses  the  opposite  order  (which  Hecht  also  uses). 

These  column  vectors  may  be  combined  to  form  a  ray  matrix  L,  where  the  columns 
are  the  marginal  and  chief  ray  vectors: 


L  = 


y  y 

nu  nu 


which  may  be  evaluated  at  any  point  in  the  system.  The  determinant  of  this  ray 
matrix  is  the  so-called  Lagrange  Invariant  (which  we  will  denote  by  the  symbol  M, 
because  it  is  the  closest  character  available  to  the  Cyrillic  character  usually  used). 
As  suggested  by  its  name,  the  Lagrange  Invariant  is  unaffected  either  by  refraction 
or  transfer  all  of  the  way  through  the  system. 

det  [L]  =  y  ■  (nu)  —  (nu)  -y  =  H 


Refraction  Matrix 

Given  the  ray  vectors  or  the  ray  matrix,  we  can  now  define  operators  for  refraction 
and  transfer.  Recall  that  paraxial  refraction  of  a  marginal  ray  and  of  a  chief  ray  at  a 
surface  with  power  ip  is: 


n'u'  =  nu  —  yip  for  marginal  ray 
n'u!  =  nu  —  yip  for  chief  ray 
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The  refraction  process  for  the  marginal  ray  may  be  written  as  a  matrix  'JZ  and  the 
output  is  the  product  with  the  ray  vector  which  will  have  the  same  ray  height  and  a 
different  angle: 


n 

y 

y 

= 

nu 

n'u' 

It  is  easy  to  see  that  the  form  of  the  matrix  must  be: 


1  0 

y 

y 

y 

-p  1 

nu 

—yp  +  nu 

n'u' 

and  its  determinant  is  unity: 


det 


1  0 
-ip  1 


(1)  (1)  -  (-¥>)  (0)  =  1 


Transfer  Matrix 

The  transfer  of  the  marginal  ray  from  one  surface  to  the  next  is  y'  =  y  +  ( n'u '), 

which  also  may  be  written  as  the  product  of  the  matrix  T  with  the  ray  vector: 


T 

y 

_ 

y  +  ( nu )  (£) 

nu 

nu 

1 

£_ 

n' 

y 

y' 

0 

1 

nu 

nu 

so  the  determinant  of  the  transfer  matrix  is  also  1 : 


det 


(i)  (i)  - (o)  0?) = 1 


Note  that  we  could  operate  on  the  ray  matrix  instead  of  individual  ray  vectors:  this 
allows  us  to  calculate  both  the  marginal  and  chief  rays  at  the  same  time: 


TIL  = 

1  0 

y 

y 

_ 

y  y 

-p  1 

nu 

nu 

nu  —  yp  nu  —  yp 

TL  = 

1  w 

n' 

y  y 

y  +  (£)  (nu)  y  +  (5)  nu 

0  1 

nu  nu 

nu  nu 
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The  refraction  and  transfer  matrices  may  be  combined  in  sequence  to  model  a 
complete  system.  If  we  start  with  the  marginal  ray  vector  at  the  input  object,  the 
first  operation  is  transfer  to  the  first  surface.  The  next  is  refraction  by  that  surface, 
transfer  to  the  next,  and  so  forth  until  a  final  transfer  to  the  output  image: 

TnR-n  '  '  '  (L object )  =  I ‘-out. 

If  the  initial  ray  matrix  is  located  at  the  object  (as  usual),  the  marginal  ray  height  is 
zero,  so  the  ray  matrix  at  the  object  and  any  images  has  the  form: 


The  matrices  appear  to  be  laid  out  in  inverse  order,  i.e. ,  the  last  matrix  first,  but  the 
transfer  matrix  T0  acts  on  the  input  ray  matrix,  so  it  must  appear  on  the  right. 


Ray  Matrix  for  Provisional  Marginal  and  Chief  Rays 

The  system  is  characterized  by  using  provisional  marginal  and  chief  rays  located  at 
the  object.  The  linearity  of  the  computations  ensure  that  the  rays  may  be  scaled 
subsequently  to  satisfy  other  system  constraints,  such  as  the  diameter  of  the  stop. 
The  provisional  marginal  ray  at  the  object  has  height  y  =  0  and  ray  angle  nu  =, 
while  the  provisional  chief  ray  at  the  object  has  height  y  =  1  and  angle  nu  =  0.  Thus 
the  provisional  ray  matrix  at  the  object  is: 


“Vertex-to- Vertex  Matrix”  for  System 

The  optical  system  matrix  excludes  the  input  ray  matrix,  the  first  transfer  matrix, 
the  last  transfer  matrix,  and  the  output  ray  matrix.  It  is  called  the  “vertex-to-vert.ex 
matrix”  and  is  labeled  MW' 

A  B 

-MVv'=  TTfi  •  •  •  = 

C  D 
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where  A,B,C,D  are  real  numbers  to  be  determined.  Since  the  determinant  of  the 
matrix  product  is  the  product  of  the  determinants,  we  can  see  that 

det  M-vv'  =  1  ==>■  AD  —  BC  =  1 

For  example,  find  A4y  for  a  two-lens  system  with  powers  f i  =  ( f\ )_1  and  f2  =  (/2)1 
separated  by  t: 


MVV' — 77-2  T\  77-1 


1 

0 

1  t 

1  0 

■<y?2 

1 

0  1 

-ifl  1 

1  —  ip  it  t 

-  (<pi  +  f2~  7h<y?2 1)  1  -  f2t 

It  is  easy  to  confirm  that  the  determinant  of  this  system  matrix  is  unity. 

To  illustrate,  consider  the  system  of  two  thin  lenses  in  the  last  section  with  fi  = 
100  mm,  f2  =  50  mm,  and  t  =  75  mm,  which  we  showed  to  have  /e//  =  +yp  mm  = 
66.7  mm.  The  system  matrix  is: 


.Mvv'  = 


1  —  fit  t 

A  B 

(<Pi  Afi-  fif2t)  1  -  <p2t 

C  D 

1 

0 

1 

75  mm 

1 

0 

1 

4 

75  mm 

1 

1 

0 

1 

1 

1 

3 

1 

50  mm 

100  mm 

200  mm 

2 

So  that  A  and  D  are  “pure”  numbers,  while  B  and  D  have  dimensions  of  length  and 
reciprocal  length,  respectively.  From  the  values  in  the  last  section,  we  can  see  that 
B  =  t  and  C  =  —j,  which  in  turn  demonstrates  that  the  power  of  a  two- lens  system 


is: 


7? 


1 

7 


f!+f2-  fl  (p2t 


1  1 

Ti+Ti 


t 

~hh 


The  input  ray  matrix  consists  of  the  provisional  marginal  and  chief  ray  at  the 
object,  which  “passes  through”  the  transfer  matrix  from  object  to  front  surface.  If 
the  object  is  located  1000  mm  from  the  first  surface,  the  ray  matrix  at  the  front  vertex 
of  the  system  is  : 


% 

y 

=  % 

0 

nu 

1 

1 

1000  mm 

0 

1000  mm 

0 

1 

1 

1 
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The  height  of  the  provisional  marginal  ray  at  the  front  vertex  is  1000  mm  and  the 
angle  is  1  radian  (which  is  a  huge  angle,  but  remember  that  all  equations  are  linear, 
so  the  angle  and  ray  height  can  be  scaled  to  any  value).  The  emerging  provisional 
marginal  ray  is: 


|  75  mm 

1000  mm 

325  mm 

y 

3  1 

1 

31 

nu 

200  mm  2 

2 

In  words,  the  marginal  ray  from  an  object  1000  mm  in  front  of  the  lens  emerges  with 
height  325  mm  and  angle  of  —  ^  radians.  To  find  the  location  of  the  image,  find  the 
distance  until  the  marginal  ray  height  y  =  0: 


325  mm 


i  7 

325  mm 

0 

n' 

_ 

0  1 

31 

2 

31 

2 

=  0 


650 

—  =  H - mm  =  +20.97  mm 

31  31 


which  agrees  with  the  result  obtained  earlier.  We  observed  that  the  magnification  of 
the  image  in  this  configuration  is 


s'  OH  2 

7  ~~  tFO7  “  31 

so  the  provisional  marginal  ray  at  the  image  point  has  the  form: 


y' 

0 

0 

n'u' 

31 

1 

2 

Mt 

The  marginal  ray  out  of  the  vertex-to-vertex  matrix  for  the  object  distance 

OV  =  1000. 
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Matrices  that  Associate  Conjugate  Points 

We  can  write  a  general  surface-to-surface  matrix  A4VV>  in  the  form: 


Mw' 


1  B 
C  D 


where  the  four  coefficients  are  to  be  determined.  The  matrix  that  relates  two  image 
planes  O  and  O'  may  be  obtained  by  adding  transfer  matrices  for  the  appropriate 
distances  from  the  object  to  the  front  vertex  (t\  =  OV)  and  from  the  rear  vertex  to 
the  image  (t2  =  V'O'). 


M-oo’  — 


1 12 

1 

ti 

.Mvv' 

0  1 

0 

1 

1  t2 

A  B 

1  : 

h 

0  1 

C  D 

0 

1 

A  +  t2C  (A  +  t2C)  t±  +  B  +  t2D 
C  Cti  +  D 


We  know  that  the  marginal  ray  heights  at  the  object  and  image  are  zero,  which  thus 
sets  some  limits  on  the  “conjugate-to-conjugate”  matrix: 


A  +  t2C  (A  +  t2C)  ti  +  B  t2D 

o  yin 

0  Vout 

C  Cti  +  D 

_M  in  (' riu)in_ 

i 

-w 

s 

O 

o 

_ 1 

r 

~i 

r 

i  r 

A  +  t2C  (A  +  t2C )  ti  +  B  -)-  t2D 

0  Vout 

0  Vin 

C  Cti  +  D 

_Mo«t  M out_ 

_M in  Min. 

Vout 

Vin 


0 


0“)  out  (nU)  in  -  (nU)  out  (nU)  in  (nU)  o 


Vin  (nu)i 


( nu)i»  J 


The  ratio  =  Mr,  whereas 

Vin  ’ 

La.aTa.npna.n  invariant: 


=  MoM  Mm  ~  ( nU)out  Min 
Vin  Min 

=  0 m)out  ~  ( nU)out  Min 
Vin 


210 


CHAPTER  10  IMAGE  FORMATION  IN  THE  RAY  MODEL 


N  =  y  ■  ( nu )  —  (nu)  ■  y 

=  Vout  ■  inu)^  -  (-. nu )out  •  yout  =  yin  •  ( nu)in  -  ( nu)in  ■  yin 

^  Vout  •  _ (Hori  '  Vout 

yin  ■  ( nu)in  -  ( nu)in  ■  yin  yin  •  ( nu)in  -  (nu)in  ■  yin 


The  conjugate-to-conjugate  matrix  includes  the  leading  and  following  ray  trans¬ 
fers: 


Moo' 


A  +  t2C  (A  +  t2C)  / 1  —  /IT  t2D 
C  Ch+D 


A  +  t2C  (A  +  ^C)  ti  +  B  +  t2D 

C  Ct1  +  D 

=*► MT  =  A  +  t2C  =  (Cti  +  D)”1 

if=-C 

0  =  (A  +  t2C)  t\  +  -B  +  t2D 


We  have  four  equations  in  the  four  unknowns  A,  5,  (7,  /J.  which  may  be  combined 
to  find  useful  systems  metrics  in  terms  of  the  elements  in  the  vertex-to-vertex  matrix 

Mv: 


distance  from  object  to  front  vertex 

QV  _  ti  __ 
n  n 

D-  — 

m  — 

c  ~ 

B-\-Dt2 

A-\-Ct2 

distance  from  rear  vertex  to  image 

V'O'  _  t2 
n'  n' 

m—A 

C  — 

B—Ati 

D-Cti 

effective  focal  length  of  system 

f ef  f 

1 

c 

front  focal  distance 

FFD  = 

FV  _ 

n 

D 

C 

back  focal  distance 

BFD  = 

V'F' 

n 

A 

C 

distance  from  front  vertex  to  object-space  principal  point 

VH 

n 

D- 1 

C 

distance  from  image-space  principal  point  to  rear  vertex 

H'V' 

n' 

1  -A 
C 

Again,  consider  the  example  of  a  system  composed  of  two  thin  lenses  with  f\  = 
+100  mm,  f2  =  +50  mm,  and  t  =  +75  mm: 


.Mvv' 


1  0 

1 

75 

1  0 

I  75 

_-5i  !. 

0 

1 

-A.  i 
100 

3  1 

200  2  _ 
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Fom  the  table  of  properties  of  the  matrix,  we  see  that: 


1 

200 

feff 

~C  ~ 

-| - mm 

3 

D 

100 

FFD  = 

- nun 

~C  ~ 

3 

A 

50 

BFD  = 

- = 

-| - nun 

C 

3 

D-  1 

VH  = 

c 

=  +100  mm 

A-  1 

H'V'  = 

=  +50  nun 

C 

which  again  match  the  results  obtained  before. 


The  matrix  that  relates  the  object 
presented  above  is: 


and  image  planes  for  the  two-lens  system 


A4oo'  —  T2Mw'Ti  — 


i  650 

1  31 

1  75 

1 

1000 

0  1 

3  1 

200  2  _ 

0 

1 

_2_ 

31 

3 

200 


which  has  the  form  of  the  principal  plane  matrix  except  the  diagonal  elements  are 
not  both  unity.  However,  note  that  they  are  reciprocals  of  teach  other,  so  that 


-A_  _3i 
[200  2  J 

We  had  evaluated  the  transverse  magnification  in  this  configuration  to  be  —  |j-,  so 
we  note  that  the  upper-left  component  of  the  conjugate-to-conjugate  matrix  is  the 
transverse  magnification.  The  general  form  of  a  conjugate-to-conjugate  matrix  is: 


M 


conjugate 


Mj'  0 

Mr 


For  the  two-lens  system  that  we  have  used  as  an  example  with  the  object  located 
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1000  mm  in 

Mqo'  = 


front  of  the  first  lens,  the  conjugate-to-conjugate  matrix  is 


0  1 


75  mm 


200  mm 


1  £ 

n 

0  1 

1  +^pmm 


Mj1  0 


|  75  mm 

1 

1000  mm 

1 

“|to 

O 

3  1 

0 

1 

3  31 

200  mm  2 

200  mm  2 

-V  M7 


“Principal  Point-to-Principal  Point  Matrix”  for  System 

The  conjugate-to-conjugate  (object-to-image)  matrix  that  relates  the  principal  points 
of  unit  magnification  is: 


AW 


1  0 
~P  1 


where  <p  is  the  power  of  the  system,  which  is  the  ability  to  deviate  incoming  rays. 


Example  of  “Vertex-to- Vertex”  Matrix 


To  illustrate,  calculate  this  matrix  for  the  thin-lens  telephoto  considered  in  the  last 
section  with  /i  =  100  mm,  fi  =  —25  mm,  and  t  =  80  mm.  The  system  matrix  is: 


MVV' 


1  —  c pit  t 

-  ((fix  +(f2-  Pi <P2t)  1  -  P2t 


1 

0 

l 

80  mm 

1 

0 

1 

5 

80  mm 

1 

l 

0 

1 

l 

1 

1 

21 

25  mm 

100  mm 

500  mm 

5 

So  that  A  and  D  are  “pure”  numbers,  while  B  and  D  have  dimensions  of  length  and 
reciprocal  length,  respectively.  From  the  values  in  the  last  section,  we  can  see  that 
B  =  t  and  C  =  —  j,  which  in  turn  demonstrates  that  the  power  of  a  two- lens  system 


is: 


P 


1 

7 


P1  +  P2-  PlP2t 


1  1 

JAJi 


t 

Ul 


The  input  ray  matrix  consists  of  the  provisional  marginal  and  chief  ray  at  the 
object,  which  “passes  through”  the  transfer  matrix  from  object  to  front  surface.  If 
the  object  is  located  1000  mm  from  the  first,  surface,  the  ray  matrix  at  the  front  vertex 
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of  the  system  is  : 


% 

y 

=  % 

0 

nu 

1 

1 

1000  mm 

0 

1000  mm 

0 

1 

1 

1 

The  height  of  the  provisional  marginal  ray  at  the  front  vertex  is  1000  units  and  the 
angle  is  1  radian  (which  is  a  huge  angle,  but  remember  that  all  paraxial  equations 
are  linear,  so  the  angle  and  ray  height  can  be  scaled  to  any  value). 


l 

5 

80  mm 

1000  mm 

280  mm 

y 

1 

21 

1 

li 

nu 

500  mm 

5 

5 

In  words,  the  marginal  ray  from  an  object  1000  mm  in  front  of  the  lens  emerges  with 
height  280  nun  and  angle  of  +^-  radians.  To  find  the  location  of  the  image,  find  the 
distance  until  the  marginal  ray  height  y  =  0: 


V'O' 


280  mm 

1  w 

280  mm 

0 

_ 

n' 

_ 

n 

0  1 

n 

n 

5 

5 

5 

280  mm  +  ( +  — —  |  =  0 

V  5  n'J 

—  =  280 mm  ■  — —  =  ^  —127.3 mm 

1  11  11 


The  magnification  of  the  image  in  this  configuration  is 


AIj*  — 


OH  2 

HW  ~~  _  31 


The  marginal  ray  out  of  the  vertex-to-vertex  matrix  for  the  object  distance 

OV  =  1000. 
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Calculating  the  Back  Focal  Distance  (BFD) 


The  image  of  an  object  located  at  oo  is  the  image-space  focal  point  of  the  system. 
This  ray  enters  the  system  with  angle  nu  =  0  and  arbitrary  height,  which  we  can 
model  as  y  =  1.  The  emerging  ray  is: 


i  75 

1 

1 

4 

3  1 

o 

3 

200  2  _ 

200  _ 

The  ray  height  is  |  and  the  angle  is  n'v!  =  —  2§o  •  The  distance  to  the  point  where 
the  ray  height  is  zero  is  the  back  focal  distance: 


BFD  =  V'F'  =  T 


1 

4 


1 

4 

1  w 

TV 

i 

4 

1 - 

O 

1 

1° 

CO  o 
ICN 

1 

_ 1 

0  1 

1 

1° 

CO  o 

|CN 

1 

_ 1 

i 

i 

tol 

O  CO 

ol 

1 _ 

3  t' 


t'  _  1 
T  “  4  x 


200  n! 
200 
ir 


=  0 

100 

IT 


=  16.7  units 


Front  Focal  Distance  (FFD):  Ray  Through  “Reversed”  System 


To  find  the  front  focal  distance,  we  can  trace  the  “provisional”  marginal  ray  “back¬ 
wards”  through  the  system,  or  trace  it  through  the  “reversed”  system  where  the  lenses 
are  placed  in  the  opposite  order.  The  “reversed”  system  matrix  is: 


(Mw1)  reverSed 


1  0 

1 

75 

1  0 

1 

2 

75 

- k-  1 

100 

0 

1 

_“5>  1_ 

3 

200 

i 

4 

Note  that  the  “diagonal”  elements  of  the  “forward”  and  “reversed”  vertex-to-vertex 
matrices  are  “swapped” ,  while  the  “off-diagonal”  elements  are  identical. 

If  the  input  ray  height  is  1  and  the  angle  is  0,  the  outgoing  ray  from  the  reversed 
matrix  is: 


_  1 

75 

1 

_  1 

H) 

100 

2 

— 

2 

=+  FFD  =  FV  =  /  =  - 

i - 

i 

tol 

O  CO 

o 

l 

4 

0 

i 

1° 

CO  o 

|CN 

1 

_ 1 

(-+) 

fir 

10.8.6  Examples  of  System  Matrices: 

Galilean  Telescope  made  of  Thin  Lenses 

The  Galilean  telescope  consists  of  an  objective  lens  with  positive  power  and  an  eyelens 
with  negative  power  separated  by  the  sum  of  the  focal  lengths.  If  the  focal  length 
of  the  objective  and  eyelens  are  f±  =  +200  and  /2  =  —25  units,  the  separation 
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t  =  (200  —  25)  =  175  units.  The  system  matrix  is: 


1  0 

1  175 

1  1 

L  (-25)  1  J 

0  1 

1  0 


(+200) 


I  175 
0  8 


Note  that  the  system  power  p  =  0  ==>  feff  =  oo,  which  means  that  the  system  is 
“afocal” .  The  ray  from  an  object  at  oo  with  unit  height  generates  the  outgoing  ray: 


i 

8 

175 

1 

y' 

i 

8 

0 

8 

0 

n'v! 

0 

so  the  outgoing  ray  is  at  height  |  and  the  angle  is  zero.  Note  that  the  diagonal 
elements  are  positive  and  the  determinant  is  1. 

The  “provisional”  chief  ray  into  the  system  has  height  0  and  angle  1;  the  outgoing 
ray  is: 


5  175 

0 

y 

175 

0  8 

1 

nu 

8 

So  the  outgoing  ray  angle  is  8  times  larger. 


Keplerian  Telescope  made  of  Thin  Lenses 

The  Keplerian  telescope  with  f\  =  +200  and  /2  =  +25  units  with  separation  t  = 
(200  +  25)  =  225  units.  The  system  matrix  is: 


1 

0 

1 

225 

1 

0 

-1  225 

i 

1 

0 

1 

i 

1 

1 

OO 

1 

o 

_ 1 

.  (25) 

(+200) 

The  diagonal  elements  are  negative,  the  determinant  is  1,  and  the  system  power 
p  =  0  ==>  feff  =  oo.  The  ray  from  an  object  at  oo  with  unit  height  generates  the 
outgoing  ray: 


-1  225 

1 

y' 

i 

8 

1 

OO 

1 

o 

_ 1 

0 

n'u' 

0 

so  the  outgoing  ray  is  at  height  —  |  -  the  image  is  “inverted”  and  the  angle  is  zero. 

The  “provisional”  chief  ray  into  the  system  has  height  0  and  angle  1;  the  outgoing 
ray  is: 


-1  225 

0 

y' 

225 

1 

OO 

1 

o 

_ 1 

1 

/ — / 
nu 

-8 

So  the  outgoing  ray  angle  is  8  times  larger  than  the  incoming  ray  but  negative. 
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Thick  Lens 


Consider  the  matrix  for  a  thick  lens  with: 


n  —  n 


<Pi  = 


Ri 


+2  = 


n  —  n 


R2 


The  system  matrix  of  the  thick  lens  is: 

1  0 

Mvv'  = 

"  ~+2  1 


1 

1  0 

n ' 

0  1 

i 

t-H 

+ 

1 

_ 1 

n' 


l~^ 

-  (+1  +  +2  “  <Pl<fi2  £)  1  -  +2^ 

We  can  immediately  identify  the  power  of  the  thick  lens,  which  may  be  written  in  the 
form  of  the  effective  focal  length: 

t' 

<P  =  <Pl  +V2  ~<Pl<P2—. 

rv 


fi  f2  rif1f2 


Consider  an  example  made  of  glass  with  n'  =  1.5  with  ll\  =  +50  mm  and  R2 
—  100  mm.  The  thickness  of  the  lens  is  10  mm.  The  powers  of  the  surfaces  are: 


= 


n'  —  n  1.5  —  1 


R^ 


+2  = 


n  —  n 


The  system  matrix  is: 

Mvv'  = 


Ri 


1 


50  mm 
1  -  1.5 
—100  mm 


=  + 


1 


50  mm 
1 


=  + 


200  mm 


l 


1 

10  mm 

1.5 

0 

1 

200  mm 

0.867  6.67  mm 


100  mm 


1 


41.096  mm 


0.967 
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The  determinant  is  1,  as  required.  Substitute  into  the  table  of  properties  to  find: 

feff  =  -  +41.096  mm 

FFD  =  —  ^  =  0.967  •  41.096  mm  =  +39.745  mm 

A 

BFD  =  =  0.867  •  41.096nmr  =  +35.635  mm 

o 

VH  =  -  =  (0.967  -  1)  •  (-41.096  mm)  =  +1.356  mm 

o 

_  +4  —  1 

H'V'  =  — —  =  (0.867  -  1)  •  (-41.096nuu)  =  +5.466  mm 


Chapter  11 

Waves  and  Imaging 


We  now  return  to  those  thrilling  days  of  waves  to  consider  their  effects  on  the  per¬ 
formance  of  imaging  systems.  We  first  consider  “interference”  of  two  traveling  waves 
that  oscillate  with  the  same  frequency  and  then  generalize  that  to  the  interference  of 
many  such  waves,  which  is  called  “diffraction” . 


11.1  Interference  of  Waves 

References:  Hecht,  Optics  §8 

Recall  the  identity  that  was  derived  for  the  sum  of  two  oscillations  with  different 
frequencies  uq  and  cu2  : 


yi  [t]  =  A  cos  [c oit] 
y2  [t]  =  A  cos  [uj2t\ 

CUi  —  U)2 
2 

=  2  A  COS  [ UJavgt ]  '  COS  [o>modt] 

In  words,  the  sum  of  two  oscillations  of  different  frequency  is  identical  to  the  product 
of  two  oscillations:  one  is  the  slower  varying  modulation  (at  frequency  co>mod)  and  the 
other  is  the  more  rapidly  oscillating  average  sinusoid  (or  carrier  wave)  with  frequency 
ujavg  .  A  perhaps  familiar  example  of  the  modulation  results  from  the  excitation  of 
two  piano  strings  that  are  mistuned.  A  low-frequency  oscillation  (the  beat)  is  heard; 
as  one  string  is  tuned  to  the  other,  the  frequency  of  the  beat  decreases,  reaching 
zero  when  the  string  frequencies  are  equal.  Acoustic  beats  may  be  thought  of  as 
interference  of  the  summed  oscillations  in  time. 

We  also  could  consider  this  relationship  in  a  broader  sense.  If  the  sinusoids  are 
considered  to  be  functions  of  the  independent  variable  (coordinate)  t.  the  phase  angles 
of  the  two  component  functions  <3?i(t)  =  cuR  and  d>2  (t)  =  uj2t  are  different  at  the  same 
coordinate  t.  The  components  sometimes  add  (for  t  such  that  dq  [t\  =  <f>2  [t]  ±  2nn) 
and  sometimes  subtract  (if  d>i(t)  =  dqW  ±  (2n  +  l)vr). 


Vi  M  +  1/2  [t]  =  2  A  cos 


OJ 1  +  u2 
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We  also  derived  the  analogous  effect  for  two  waves  traveling  along  the  2-axis: 

fi  [z,  t\  =  A  cos  [kiz  —  uit] 
f2  [z,  t]=  A  cos  [k2z  -  uj2t\ 

fl  [•£)  A]  "b  f2  t\  {2A  COs[Ajmod-^  ^modt]  }  '  COS \kaVgZ  UlaVgt\ 


,  h  -  k2 

^mod  2 

CJ2 

^mod  ^ 

_  ^mod  _  CUi  ~ 

^mod  ~i  7 

^mod  k  1  —  K2 

7  _  +  k2 

ft avg  ^ 

_  CUi  +  td2 

^ 

Uavg  _  CUi  +  CU2 

Va"S  “  “  h  +  k2 

In  words,  the  superposition  of  two  traveling  waves  with  different  temporal  frequencies 
(and  thus  different  wavelengths)  generates  the  product  of  two  component  traveling 
waves,  one  oscillating  more  slowly  in  both  time  and  space, he.  a  traveling  modulation. 
Note  that  both  the  average  and  modulation  waves  move  along  the  2-axis.  In  this 
case,/q,  k2.  uj\  .and  o^are  all  positive,  and  so  kavg  and  c oavg  must  be  also.  However,  the 
modulation  wavenumber  and  frequency  may  be  negative.  In  fact,  the  algebraic  sign 
of  kmod  may  be  negative  even  if  a;rrior]  is  positive.  In  this  case,  the  modulation  wave 
moves  in  the  opposite  direction  to  the  average  wave. 


Note  that,  if  the  two  1-D  waves  traveling  in  the  same  direction  along  the  2-axis 
have  the  same  frequency  u,  they  must  have  the  same  wavelength  A  and  the  same 
wavenumber  k  =  ^-.  The  modulation  terms  A’mod  and  cjmod  must  be  zero,  and  the 
summation  wave  exhibits  no  modulation.  Recall  also  such  waves  traveling  in  opposite 
directions  generate  a  waveform  that  moves  but  does  not  travel,  but  is  a  standing  wave: 

fi  [2,  t\  =  A  cos  [k\z  —  u>if\ 
f2  [2,  t\  =  A  cos  [k±z  +  uit] 

fi  [2,  f]  f2  [2,  f]  {2 A  cos  [kmodz  wmodt] }  cos  \kaVgZ  uiavgt\ 
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^iriod 

^mod 


k, 


avg 


OJ, 


avg 


h  ~  h  _  f 

2 

Vl  -  (~Ui) 
2 

k\  +  k\  _ 

2  “ 

+  (~^i) 
2 


0 


fi  [z,  t ]  +  /2  [2:,  f]  =  2A  cos  [k\z\  cos  [— uq t] 
=  2Acos  [klZ\  cos  [uqi] 

where  the  symmetry  of  cos[d]  was  used  in  the  last  step. 


Traveling  waves  also  may  be  defined  over  two  or  three  spatial  dimensions;  the 
waves  have  the  form  f[x,  y,  t]  and  f[x,  y,  z,  t].  respectively.  The  direction  of  propaga¬ 
tion  of  such  a  wave  in  a  multidimensional  space  is  determined  by  a  vector  analogous 
to  k:  a  3-D  wavevector  k  has  components  [kx,  ky,  kz] .  The  vector  may  be  written: 

k  =  [kx±  +  ky  y  +  kz  z] 

The  corresponding  wave  travels  in  the  direction  of  the  wavevector  k  and  has  wave¬ 
length  A  =  p  .  In  other  words,  the  length  of  k  is  the  magnitude  of  the  wavevector: 

Ib-I  —  j U2  I  U2  I  U2  _  ^71~ 

1*1  —  y  Kx  +  Ky  +  Kz  ~  ^  • 

The  temporal  oscillation  frequency  00  is  determined  from  the  magnitude  of  the  wavevec¬ 
tor  through  the  dispersion  relation: 

11  1  Y<t> 

U  =  V<t,  ■  |k|  — V  =  y 


For  illustration,  consider  a  simple  2-D  analogue  of  the  1-D  traveling  plane  wave. 
The  wave  travels  in  the  direction  of  the  2-D  wavevector  k  which  is  in  the  x  —  z  plane: 

k  \kxi  0)  k z_ 

The  points  of  constant  phase  with  with  phase  angle  <j)  =  C  radians  is  the  set  of  points 
in  the  2-D  space  r  =  [x  =  0,y,z\  =  (r,  9)  such  that  the  scalar  product  k  ■  r  =  C: 
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k  •  r  =  r  •  k 

=  |k||r|  cos  [9] 

=  kxx  +  kzz  =  C  for  a  point  of  constant  phase 

Therefore,  the  equation  of  a  2-D  wave  traveling  in  the  direction  of  k  with  linear 
wavefronts  is: 


/  [x,  y,t\  =  A  cos  [kxx  +  kzz  —  ujt] 

=  A  cos  [k  •  r  —  c at] 

In  three  dimensions,  the  set  of  points  with  the  same  phase  lie  on  a  planar  surface  so 
that  the  equation  of  the  traveling  wave  is: 

f  [x,  y,  z,t}=  f  [l,  t] 

=  A  cos  [kxx  +  kyy  +  kzz  —  ojt \ 

=  A  cos  [k  •  r  —  c at] 


Plane  wave  traveling  in  direction  k 


This  plane  wave  could  have  been  created  by  a  point  source  at  a  large  distance  to  the 
left  and  below  the  z-axis. 

Now,  we  will  apply  the  equation  derived  when  adding  oscillations  with  different 
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temporal  frequencies.  In  general,  the  form  of  the  sum  of  two  traveling  waves  is: 

fi  [ x ,  y,  z,  t ]  +  /2  [x,  y,  z,t\  =  A  cos  [ki«r  —  cut]  +  A  cos  [k2»r  —  cut] 

=  2 A  cos  [kat)ff  •  r  -  ujavgt\  •  cos  [kmod»r  -  cumodt] 

where  the  average  and  modulation  wavevectors  are: 

,  _  +  k2  _  (kx)\  +  (kx)2  „  t  (ky)i  +  {ky)2  ^  ,  (kz)i  +  (/cz)2  „ 

±av9  -  — —  -  2  -  2  2  - 

1„  _  — 1  _  — 2  _  (kX)  1  —  {kx)2_Cr  ,  (^y)l  —  (ky)2f.r  f  (kz)  1  —  {kz)2  ^ 

—mod  -  — 2  -  2  -+  2  2  - 

and  the  average  and  modulation  angular  temporal  frequencies  are: 

Co>i  +  U>2 
^avg  =  ^ 

_  CUi  -  U>2 

^mod 


Note  that  the  average  and  modulation  wavevectors  kavfj  and  kmod  point  in  different 
directions,  in  general,  and  thus  the  corresponding  waves  move  in  different  directions 
at  velocities  determined  from: 

^ avg 

Vayg  =  r 

I  —avg  I 

_  k^mod 
Vmod  “j j 

—mod 


Because  the  phase  of  the  multidimensional  traveling  wave  is  a  function  of  two 
parameters  (the  wavevector  k  and  the  angular  temporal  frequency  c o),  the  phases  of 
two  traveling  waves  usually  differ  even  if  the  temporal  frequencies  are  equal.  Consider 
the  superposition  of  two  such  waves: 

CUi  =  CU2  =  UJ 

The  component  waves  travel  in  different  directions  so  the  components  of  the  wavevec¬ 
tors  differ: 

—i  =  [{kx) it  iky) i)  ikz)i]  ~f~  k2  =  \[kx)2i  iky) 2,  (&2)2] 

Since  the  temporal  frequencies  are  equal,  so  must  be  the  wavelengths: 

Ai  =  A2  =  A  — >  |kx  |  =  |k2|  =  |k|  . 

The  condition  of  equal  cu  ensures  that  the  temporal  average  and  modulation  frequen- 
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cies  are: 


UJ\  +  UJ2 
M avg  2 

Ul  ~  „ 

Wmod  —  2  t* 

The  summation  of  the  two  traveling  waves  with  identical  magnitudes  may  be 
expressed  as: 


/1  [x,  y,  z,  t]  +  f2i[x,  y,  z,  t\  =  A  cos(ki«r  —  u>ot)  +  A  cos(k2«r  —  coot) 

=  2 A  cos(ka„g  •  r  -  a javgt)  •  cos(kmod  •  r  -  0  ■  t) 

=  2 A  cos(ka„9  •  r  -  uavgt)  •  cos(kmod  •  r) 

Therefore,  the  superposition  of  two  2-D  wavefronts  with  the  same  temporal  frequency 
but  traveling  in  different  directions  results  in  two  multiplicative  components:  a  trav¬ 
eling  wave  in  the  direction  of  k  ,  and  a  wave  in  space  along  the  direction  of  kmod 
that  does  not  move.  This  second  stationary  wave  is  analogous  to  the  phenomenon  of 
beats,  and  is  called  interference  in  optics. 


11.1.1  Superposition  of  Two  Plane  Waves  of  the  Same  Fre¬ 
quency 


Consider  the  superposition  of  two  plane  waves: 


fi  [a,  y,  z,t]  =  A  cos  [k1  •  r  —  u0t] 

h  [x,  V,  z,t]=  A  cos  [k2  •  r  —  u0t] 

kx  \kXi  ky  0,  k%\ 

k2  [  kx,  0,  kz\ 


i.e.,  the  wavevectors  differ  only  in  the  ^-component,  and  there  only  by  a  sign.  There¬ 
fore  the  two  wavevectors  have  the  same  “length”: 


ki 


k2 


2tt 

T 


Ai  = 


A2  =  A. 


kz  =  |k|  cos  [6] 
kx  =  ^~  sin  [0\ 


2vr 

T 


cos  [ 9\ 


Also  note  that: 
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ki+k2  [kx,0,kz\  +  [kx,0,kz]  ,  .2tt 

= - 2 - =  - 2 - =  ^  °’  =  -T  C0S  ^ 

,  k,  —  k2  +  [-*„(),*,]  r  ,  .2vr  . 

k„.,„  = - - - = - - - =  [kx,  0,  0J  =  x—  sin  [0\ 


Uivg 


^ avg 


^mod 


0J\  +  U)2 
2 

CUi  -  U>2 


=  UJ  0 
=  0 


The  wavevectors  of  two  interfering  plane  waves  with  the  same  wavelength. 


These  two  waves  could  have  been  generated  at  point  sources  located  above  and  be¬ 
low  the  2-axis  a  large  distance  to  the  left.  This  is  the  classic  “Young’s  double  slit” 
experiment,  where  light  from  a  single  source  is  split  into  to  waves  (spherical  waves  in 
this  case)  and  propagate  a  large  distance  to  the  observation  plane: 
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x 


z 


How  two  “tilted”  plane  waves  are  generated  in  the  Young  double- aperture 
experiment.  The  two  apertures  in  the  opaque  screen  on  the  left  divide  the  incoming 
wave  into  two  expanding  spherical  waves.  After  propagating  a  long  distance,  the 
spherical  waves  approximate  plane  waves  that  are  tilted  relative  to  the  axis  by 


The  “tilts”  of  the  two  waves  are  evaluated  from  the  two  distances: 

„  d/2  d 

9  =  —  =  — 

L  2  L 

If  L  »  d,  then 

9  ^  tan  [0]  =  sin  [9]  = 

ZLj 

The  superposition  of  the  two  electric  fields  is: 


=  2 A  cos  [ka„g  •  r  -  c oavgt\  ■ 
r  z 

=  2 A  cos  27t—  cos  [9]  —  ui0t 
A 


cos 


27t—  sin  [9] 
A 


The  first  term  (with  the  time  dependence)  is  a  traveling  wave  in  the  direction  defined 
by  k  =  [0,0,*,],  while  the  second  term  (with  no  dependence  on  time)  is  a  spatial 
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wave  along  the  y  direction.  The  amplitude  variation  in  the  y  direction  is: 


2Acos 

27 sin  [0] 
A 

=  2Acos 

2tt- 

I 

X 

'  A  A 

\ 

which  has  a  period  of  The  irradiance  (the  measureable  intensity)  of  the  super¬ 
position  is: 


|/  [x,  y,  z,  t]  |2  =  4 A2  cos2  2 n—  cos  [9]  —  u>0t 

L  A 

The  second  cosine  terms  can  be  rewritten  using: 

cos2  [9]  =  ^(1  +  cos  [2 9}) 


cos 


2nx  sin  [9] 
A 


As  before,  the  first  term  varies  rapidly  due  to  the  angular  frequency  term  oj(j  =  1014  Hz. 
Therefore,  just  the  average  value  is  detected: 


(if  [x,y,z,t]  |2)  =  4A2  cos2 

=  2A2 


2-kx  sin  [9] 
A 


1 

2 


-  ( 1  +  cos 

47ra’  sin  [9] 

L2  V 

A 

)\ 

=  A2  1  +  cos 


27T 


X 


A 


2-sin[( 


This  derivation  may  also  be  applied  to  find  the  irradiance  of  one  of  the  individual 
component  waves: 


h  =  (l/i  [x,y,z,t] |2) 
h  =  <|/2  [x,y,z,t]  |J) 

J0  =  (|/i  [x,y,z,t}\2)  =  (|  A  cos  [k! -r  — cu0t]|J)  =  A2  (cos2  [kx  •  r  -  a;0t]> 
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So  the  irradiance  of  the  sum  of  the  two  waves  can  be  rewritten  in  terms  of  the 
irradiance  of  a  single  wave: 


(I  f[x,y,z,t]  |2)  =  4/qCos2 


=  2/n  1 


27 rx  sin  [9] 

A 

cos  [277 x  ■  2  •  sin  [0]] ' 


A 


=  2/0  1  +  COS 


27 r 


x 


2  sin[0] 


The  irradiance  exhibits  a  sinusoidal  modulation  of  period  X  =  9sA^  and  its  irradiance 
oscillates  between  0  and  2/0  ■  (2)  =  4/0,  so  that  the  average  irradiance  is  2/0.  The 
period  varies  directly  with  A  and  inversely  with  sin($);  for  small  9,  the  period  of  the 
sinusoid  is  large,  for  9  =  0 ,  there  is  no  modulation  of  the  irradiance.  The  alternating 
bright  and  dark  regions  of  this  time-stationary  sinusoidal  intensity  pattern  often  are 
called  interference  fringes.  The  shape,  separation,  and  orientation  of  the  interference 
fringes  are  determined  by  the  incident  wavefronts,  and  thus  provide  information  about 
them.  The  argument  of  the  cosine  function  is  the  optical  phase  difference  of  the  two 
waves.  At  locations  where  the  optical  phase  difference  is  an  even  multiple  of  7r,  the 
cosine  evaluates  to  unity  and  a  maximum  of  the  interference  pattern  results.  This 
is  an  example  of  constructive  interference.  If  the  optical  phase  difference  is  an  odd 
multiple  of  7 r,  the  cosine  evaluates  to  -1  and  the  irradiance  is  zero;  this  is  destructive 
interference. 


z 


Irradiance 


Standing  Wave  I 
Traveling  Wave  ■-  ■» 


Interference  of  two  “tilted  plane  waves”  with  the  same  wavelength.  The  two 
component  traveling  waves  are  shown  as  “snapshots”  at  one  instant  of  time  on  the 
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left  (white  =  1,  black  =  -1);  the  sum  of  the  two  is  shown  in  the  center  (white  =  2, 
black  =  -2),  and  the  squared  magnitude  on  the  right  (white  =  f,  blace  =  0).  The 
modulation  in  the  vertical  direction  is  constant,  while  that  in  the  horizontal  direction 
is  a  traveling  wave  and  “averages'’  out  to  a  constant  value  of  |. 

The  amplitude  and  irradiance  observed  at  one  instant  of  time  when  the  irradiance  at 
the  origin  (“on  axis”)  is  a  maximum  is  shown: 


(a)  (b) 


x  (units  of  A  /  sin[0] )  x  (units  of  A  /  sin[0] ) 


Interference  patterns  observed  along  the  x-axis  at  one  value  of  z:  (a)  amplitude 
fringes,  with  period  equal  to  irradiance  (intensity)  fringes,  with  period  equal  to 
2-si^g| .  This  pattern  is  averaged  over  time  and  scales  by  a  factor  of 

Again,  the  traveling  wave  in  the  images  of  the  amplitude  and  intensity  of  the 
superposed  images  moves  in  the  ^-direction  (to  the  right),  thus  blurring  out  the 
oscillations  in  the  ^-direction.  The  oscillations  in  the  x-direction  are  preserved  as  the 
interference  pattern,  which  is  plotted  as  a  function  of  x  below.  Note  that  the  spatial 
frequency  of  the  intensity  fringes  is  twice  as  large  as  that  of  the  amplitude  fringes. 


Irradiance  patterns  observed  at  the  output  plane  at  several  instants  of  time,  showing 
that  the  spatial  variation  of  the  irradiance  is  preserved  but  the  averaging  reduces  the 

maximum  value  by  half. 
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11.1.2  Superposition  of  Two  Plane  Waves  with  Different  Fre¬ 
quencies 


For  further  illustration,  consider  the  case  the  two  waves  travel  in  the  same  directions, 
so  that  k,  k2,  but  with  different  temporal  frequencies  uq  f  Uj2.  This  means  that 
Ikil  7^  |k2|.  The  average  and  modulation  wavevectors  are  found  as  before,  but  the 
modulation  wave  now  travels  because  both  kmod  7^  0  and  tcmod  7^  0.  Consider  the 
example  of  two  component  waves:  fi  [r,£]  directed  at  an  angle  91  =  +40°  =  |  radian 
with  Xi  =  8  units  and  |  radians/second,  and  f2  [r,  t]  directed  at  92  =  —40°  =  —  | 
radian  with  A2  =  12  units  and  uj2  =  77  radians/second.  The  corresponding  average 
and  modulation  frequencies  are: 

- 

^ avg 

^mod  — 


'  C 02 


'  L 


2vr  (l  1  \  2tt  5tt 

—  — I - =  —  =  —  radians /second 

2  \8  12/  9.6  24  ' 

—  ( -  — —  )  =  —  radians/second 

2  \8  12/  48  ' 


and  the  average  and  modulation  wavevectors  are: 


k  =  ki  +k2  =  2  . 

=±avg  2  ^ 

2tt 


X 

48 


sin  [40c 


5z 

48 


cos  [40° 


Z7T  / 

=  —  •  (a;  sin  [40°]  +  5z  cos  [40°])  =  27t  ( 
48  V 


x 


74.674  12.532 


k  1^-  O/7 r 

kmod  =  =  48  '  {5XS[U  [40°]  +  *  C0S  [40°]) 


=  2vr 


X 


+ 


2 


14.935  62.0 


The  superposition  may  be  written  as  the  product  of  the  average  and  modulation 
waves: 

fi  [l,  t]  +  f2  [r,  t]  =  2  favg  [r,  t]  ■  fmod  [r,  t] 
where  the  full  expressions  for  the  average  and  modulation  waves  are: 

favg  [r,  t]  =  COS  [ka„g  ■  £  -  C 0avgt\ 

'2tt 


=  cos 


=  cos 


—  •  (x  sin  [40°]  +  5z  cos  [40°])  -  ^ 


'27 r 
48 

2t r 


(y  sin  [40°]  +  5z  cos  [40°])  —  5 1) 


x 


+ 


74.674  12.532 


-2tt 


t 

9T 
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fmod(jLi  0  c0s(kmod  ’  —  ^mod^) 

'2v r 


=  cos 


=  cos 


2nt 


4g  •  (5x  sin  [40°]  +  zcos  [40°])  -  4g 


2tt 

48 

2vr 


(5a:  sin  [40°]  +  z  cos  [40°]  —  2 1) 

t 


x 


+ 


74.674  12.532 


211  24 


Note  that  both  the  average  and  modulation  waves  are  traveling  waves;  they  are  headed 
in  different  directions  with  different  frequencies  and  different  velocities.  The  temporal 
frequencies  are  uavg  =  ^  Hz  and  z/mod  =  Hz.  If  the  intensity  (squared-nragnit.ude) 
of  the  sum  is  averaged  over  time  at  an  observation  plane  located  downstream  on  the 
z-axis,  both  traveling  waves  will  average  out  and  no  stationary  fringe  pattern  will  be 
visible. 


z 


Irradiance 


|k2|  =  2idX2  oc  1/8 

Sum  of  two  sinusoidal  traveling  waves  where  the  periods  are  related  by  X2  =  |Ai. 
The  two  waves  travel  in  the  directions  ±40°,  respectively.  The  resulting  amplitude 
sum  and  power  are  depicted  as  “snapshots”  at  one  instant  of  time.  Since  the 
modulation  wave  now  travels  too,  both  waves  are  averaged  to  constant  values  and  no 

fringes  are  visible. 


The  same  principles  just  discussed  may  be  used  to  determine  the  form  of  interfer¬ 
ence  fringes  from  wavefronts  with  other  shapes.  Some  examples  will  be  considered  in 
the  following  sections. 


232 


CHAPTER  11  WAVES  AND  IMAGING 


(a)  (b) 


Figure  11.1:  Intensity  patterns  observed  at  the  output  plane  at  several  instants  of 
time.  The  velocity  of  the  modulation  wave  makes  this  pattern  “ migrate ”  towards  —x, 
and  thus  the  time-averaged  pattern  is  a  constant;  no  interference  is  seen. 


11.1.3  Fringe  Visibility  —  Coherence 


The  visibility  of  a  sinusoidal  fringe  pattern  is  a  quality  that  corresponds  quite  closely 
to  modulation,  which  is  a  term  used  by  electrical  engineers  (sparkies).  Given  a  nonneg¬ 
ative  sinusoidal  irradiance  (intensity)  distribution  with  maximum  Imax  and  minimum 
I  min  (so  that  Imin  >  0),  the  visibility  of  the  sinusoidal  fringe  pattern  is: 


V  = 


Imax  Imin 
Imax  T  Imin 


Note  that  if  Imin  =  0,  then  V  =  1  regardless  of  the  value  of  Irnax  .  The  visibility  of 
the  fringe  pattern  is  largely  determined  by  the  relative  irradiances  of  the  individual 
wavefronts  and  by  the  coherence  of  the  light  source. 

To  introduce  the  concept  of  coherence,  consider  first  the  Young’s  two-aperture 
experiment  where  the  source  is  composed  of  equal-amplitude  emission  at  two  distinct 
wavelengths  X\  and  A2  incident  on  the  observation  screen  at  ± 6 .  Possible  pairs  of 
wavelengths  could  be  those  of  the  sodium  doublet  (A  =  589.0  nm  and  589.6  nni),  or  the 
pair  of  lines  emitted  by  a  “greenie”  He:Ne  laser  (A  =  543  nm  (green),  594  nm  (yellow)). 
In  air  or  vacuum,  the  corresponding  angular  frequencies  obviously  are  =  yy  and 

,  ,  27 re 

^2  -  AT  • 

To  find  the  irradiance  pattern  created  by  the  interference  of  the  four  beams,  we 
must  compute  the  superposition  of  the  amplitude  of  the  electromagnetic  field,  find  its 
squared-magnitude,  and  then  determine  the  average  over  time.  The  sum  of  the  four 
component  terms  is  straightforward  to  compute  by  recognizing  that  it  is  the  sum  of 
the  amplitude  patterns  from  the  pairs  of  waves  with  the  same  wavelength.  We  have 
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already  shown  that  the  sum  of  the  two  terms  with  A  =  Ai  is: 

2ttx 


fi  [x,  2;  Ai]  +  f2  [x,  z\  Ai]  =  2 A  cos 

=  2  A  cos 


Ai 


27T 


-sill  [0] 


2tTZ 

cos 

— —  cos  [0\  —  upt 
Ai 

X 


sin[0] 


27 TZ 

cos 

— —  COS  [v\  —  LO\t 
Ai 

which  is  the  sum  of  a  stationary  sinusoid  and  a  traveling  wave  in  the  + 2-direction.  If 
we  add  a  second  pair  of  plane  waves  with  different  wavelength  A  =  A2  but  the  same 
“tilts,”  the  amplitude  pattern  can  also  be  calculated.  We  have  to  add  the  amplitude 
of  four  waves,  but  we  can  still  add  them  in  pairs.  The  first  pair  produces  the  same 
amplitude  pattern  that  we  saw  before.  The  second  wave  also  produces  a  pattern  that 
differs  only  in  the  periods  of  the  sinusoids. 


d/2 

d/2 


—3 


z 


J<4 


The  sum  of  four  tilted  plane  waves  can  be  calculated  by  summing  the  pair  due  to  one 

wavelength  and  that  due  to  the  other. 


The  second  pair  of  wavefronts  with  A  =  A2  yield  a  similar  result,  though  the  period 
of  the  stationary  fringes  and  the  temporal  frequency  of  the  traveling  wave  differ.  The 
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expression  for  the  sum  of  the  two  pairs  is: 


fn  z\ A]  =  h  [®>  Ai]  +  h  [x,  z;  Ai]  +  A  [x,  z;  A2]  +  /2  [x,  z ;  A2] 


=  2Acos  2ir 


+  2Acos  2n 


cos  (  —  cos  [9]  \  z  —  u\t 


( 2ir  r/11\ 

cos  I  —  cos  [9\  J  Z  —  UJ2t 


=  2  A  cos  27T' 


=  2Acos  27t 


cos  \k\Z  —  c^it]  +  2Acos  2ir- 


cos  ki  (  z  —  —t )  +2A  cos  27t 


cos  [k2z  —  cu2t] 


7  /  W2, 
cos  k2  [  z  —  —t 


=  2A  cos  27t 


cos  [k\  (z  —  Vit)]  +  cos  2ir- 


cos  [k2  (z  -  v2t)} 


where: 


Ikil  =  h  = 


Ai  cos  [9] 
27 r 


k2  =k2  =  - - T~r 

A2  cos  [9 \ 

2nUl  UlXl  rni  c  rm 
vi  =  7T  =  z  =  — ~  cos  P]  =  -  cos  l0} 


*■  (?) 


a;2  2ttz/2  z/2A2  c 

y2  =  7-  =  -2ir-  =  —  cos  ^  =  7  cos  ^  =  Vl 
K2  A2-cos[0]  ^  2 

=>•  V2  =  Vi 

Thus  both  traveling  waves  propagate  down  the  z-axis  with  the  same  velocity,  and 
that  term  may  be  factored  out: 


£  fn  [x,  z;  A]  =  2A  |  cos  27t 


+  cos  2n 


2 nz  r  n 

cos  — —  cos  [9\  —  ouit 
Ai 
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The  squared  magnitude  of  the  amplitude  is: 


Y  fn  [ x ,  z ;  A] 


n=  1 


=  4A2  cos 


27 r 


x 


Ai 


COS 


sin[0]  J  J 

and  the  time  average  yields  the  irradiance: 

4 


27T 


X 


A2 

sin[6)]  I  J 


COS 


27 TZ 


COS  [9]  —  LU \t 


I[x,z  = 


Y  fn  ^  Ai 


n=l 


=  4A2  cos 


=  2A2  cos 


2tt 


2; 


2tt 


Ai 

sin[0] 


a; 


Ai 

-,\n[0] 


COS 


cos 


2tt 


a; 


2v r- 


A2 

sin[0] 


a; 


cos 


9  I  27 TZ 

“  1  — —  COS  [0]  —  Ult 
Ai 


A2 
an[6»]  J  J 


The  sum  of  the  two  stationary  cosine  waves  also  may  be  recast  as  the  product  of 
cosines  with  the  average  and  modulation  frequencies: 


cos 


2tt- 


x 


Ai 

sin[0] 

=  4A  cos 


cos 


2tt- 


X 


A2 

3in[0] 


27tx  sin  [9] 


=  4  A  cos 
=  4 A  cos 


Ai  ^  A2 
2 

•  f  ^  +  As 

ttx  sm  [9 \  I  — — - — 

V  Ai A2  / 

277 a;  sin  [9]  f  Ai  +  A2^ 


A1A2 


cos 


cos 


cos 


2nx  sin  [9] 


nx  sin  [9] 


j _ i_ 

Ai  A2 


2 

Ai  —  A2 


A1A2  / 

2nx  sin  [9]  f  Ai  —  A2 


Ai  A 


1^2 


=  4A  cos 


2nx  sin  [9] 

A1A2 


A, 


avg 


COS 


2tix  sin  [9] 

A1A2 


A 


mod 


where  Xavg  =  ,  Amod  =  Ai-Aa . 


The  final  expression  for  the  irradiance  is  the  product  of  two  sinusoidal  irradiance 
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patterns  with  identical  maxima  and  zero  minima: 


I  [ x ,  2]  =  2A2  |  cos 
=  2A2  cos2 


2nx  sin  [9] 
AiA2 

27 tx  sin  [9] 


A, 


avg 


A, 


avg 


=  2  A2 


1  +  COS 


=  2A2  •  1  +  cos 


A1A2 
2nx  sin  [9] 
A1A2 

27 TX 


■  COS 


•  cos 


27 tx  sin  [0] 

A1A2 

2nx  sin  [0] 


A1A2 


A  mod 
Amod 


A, 


avg 


1 

2 


=  2A2  •  (  1  +  cos 


A1A2 

Aa vg  sin[0] 

2tix 


^1  +  cos 

47nr  sin  [d] 

axa2  H 

^1  +  cos 

2tix 

(  X1X2  A 
_  V  Amod  sin[0]  I  _ 

) 

L  D  avg 


1  +  COS 


2nx 


D 


mod 


where  the  respective  periods  of  the  two  oscillations  are  defined  to  be: 


D 


avg  — 


D 


mod  — 


A1A2 

A avg  sin  [9] 

A1A2 

Amod  sin  [6] 


1  f\_  J_ 

sin  [9]  Ui  +  A2 
1  1  1 

sin  [9]  Ai  A2 


oc  |Ai  +  A2I  1  cx:  (A avg) 
(x  |Ai  —  A2I  1  =  (AA) 


Note  that  the  spatial  periods  of  the  oscillations  are  proportional  to  \a^g  oc  (Ai  +  A2)  1 
and  A-1d  oc  |Ai  —  A2I  =  (AA)-1.  In  the  case  where  the  two  emitted  wavelengths  are 
close  together  such  that  Ai  =  A2  =  Xavg  »  Amod  ?  the  expressions  for  the  periods  of 
the  two  component  oscillations  may  be  simplified: 


P  ~  -^A avg 

avg~  2d 


F)  — 
-^mod  — 


L  (A 


avg) 


2d  ■  AA 


After  cancelling  the  common  terms,  the  relative  lengths  of  the  spatial  periods  of  the 
modulations  are: 


Davg  ^Amod  .  Dmo  d  (when  AA  <<  A  avg) 

^ avg 

In  words,  the  period  of  the  modulation  due  to  Amod  is  much  longer  than  that  due 
to  A avg  if  the  emitted  wavelengths  are  approximately  equal.  The  period  Dm(ir\  limits 
the  range  of  x  over  which  the  short-period  fringes  can  be  seen.  In  fact,  the  sinusoidal 
fringes  due  to  \avg  are  visible  over  a  range  of  x  equal  to  range  of  x  between  the  zeros 
of  Dm,,f\  ,  i.e.,  half  the  period  of  -Dmod  .  The  pattern  resulting  from  the  example 
considered  above  is  shown.  Note  that  the  amplitude  of  maxima  of  the  the  irradiance 
fringes  decreases  away  from  the  center  of  the  observation  screen,  where  the  optical 


11.1  INTERFERENCE  OF  WAVES 


237 


path  lengths  are  equal  for  all  four  beams,  and  thus  where  they  will  add  constructively. 


(a) 


(b) 


(c) 


Interference  patterns  from  two  wavelengths  with  same  input  amplitude:  (a)  the  two 
amplitude  patterns  differ  in  period  in  proportion  to  the  wavelength;  (b)  the  sum  of 
the  two  amplitude  patterns  at  one  instant  of  time;  (c)  the  squared  magnitude  at  the 
same  instant,  showing  that  the  amplitude  of  the  fringe  varies  with  x. 

In  this  case  where  A  A  <<  Xavg  ,  the  fringes  are  visible  over  a  large  interval  of 
x.  We  speak  of  such  a  light  source  as  temporally  coherent ;  the  phase  difference  of 
light  emitted  at  Ai  and  at  A2  changes  slowly  with  time,  and  thus  with  the  position 
along  the  x-axis.  Therefore,  fringes  are  visible  over  a  large  range  of  x.  On  the  other 
hand,  if  AA  is  of  the  same  order  as  Xavg  ,  the  wavelengths  are  widely  separated.  The 
phase  difference  of  light  emitted  at  the  two  extreme  wavelengths  changes  rapidly  with 
time,  and  thus  with  position  along  the  x-axis.  The  fringes  are  visible  only  where  the 
phase  difference  remains  approximately  constant  (for  x  =  0)  over  the  averaged  time 
interval.  Such  a  source  is  said  to  be  temporally  incoherent.  It  is  difficult  (though  not 
impossible)  to  see  fringes  generated  by  an  incoherent  source. 


11.1.4  Coherence  Time  and  Coherence  Length 

If  two  wavelengths  emitted  by  the  source  are  separated  by  AA,  the  corresponding 
frequency  difference  often  is  called  the  bandwidth  of  the  source: 


Z/i  —  zz2  — 

c 

c 

A2  —  Ai| 

AA 

Ai 

a2 

Ai  ■  A2 

Ai  ■  A2 

If  the  source  includes  a  third  wavelength  A3  midway  between  the  extrema  A]  and 
A2  (so  that  A 3  =  XaVg),  the  factors  AA  and  Xavg  are  unchanged,  but  the  irradiance 
pattern  must  be  different  in  some  way.  The  irradiance  pattern  generated  by  this 
three-line  source  is  more  difficult  to  calculate,  but  the  result  can  be  modeled  easily  by 
recognizing  that  the  wavefronts  generated  by  A3  through  the  two  apertures  combine  in 
amplitude  to  create  a  third  pattern  of  sinusoidal  fringes  with  a  spatial  period  between 
those  due  to  Ai  and  A2.  The  three  such  patterns  may  be  summed  and  squared  to 
model  the  irradiance  fringes.  Consider  first  the  individual  fringe  patterns  due  to  the 
extrema  Ai  and  A2  as  shown  in  (a): 


238 


CHAPTER  11  WAVES  AND  IMAGING 


(a)  Irradiance  pattern  resulting  from  two  wavelengths  with  equal  “powers”,  showing 
the  long-period  fringes  due  to  AA  and  the  short-period  fringes  due  to  \avg ;  (b) 
Irradiance  pattern  after  a  third  wavelength  is  added  at  \avg  with  the  same  “power”. 
The  distance  between  peaks  of  the  fringe  pattern  has  increased. 


The  irradiance  pattern  generated  from  the  superposition  of  these  fringe  patterns  ex¬ 
hibits  the  short-period  fringes  due  to  \avg  and  the  long-period  fringes  due  to  AA. 

If  we  add  a  third  fringe  pattern  due  to  A3  =  Xavg  ,  the  resulting  irradiance  fringe 
pattern  is  shown  in  (b).  Note  that  the  region  of  visible  fringes  covers  approximately 
the  same  extent  of  the  .r-axis.  but  the  distance  between  such  regions  has  increased. 
By  extension,  if  all  wavelengths  are  included  in  the  source  between  Ai  and  A2,  visible 
fringes  will  exist  only  in  one  region  centered  about  x  =  0. 

Because  the  region  of  fringes  created  by  the  three-line  source  is  similar  in  size 
to  that  from  the  two-line  source,  but  (infinitely)  much  smaller  than  the  region  of 
interference  from  the  single-line  source,  we  say  that  light  from  the  first  two  are  equally 
coherent,  but  less  coherent  than  light  emitted  by  the  single-line  source.  The  coherence 
may  be  quantified  based  on  the  temporal  bandwidth.  For  a  source  whose  range  emitted 
wavelengths  is: 

A  A  Amax  Amin, 

the  corresponding  temporal  bandwidth  is: 


A  v  =  c  • 


AA 

‘max  '  Amjn 


Note  that  the  dimensions  of  Av  are  (time)”1.  The  time  delay  over  which  the  phase 
difference  of  light  emitted  from  one  source  point  is  predictable  (and  thus  over  which 
fringes  may  be  generated)  is  the  inverse  of  this  bandwidth: 


At  = 


1 

~Av 


max  *  /'min 

c  •  AA 


which  is  called  the  coherence  time.  Obviously,  if  AA  is  large,  then  so  is  Au  and  the 
corresponding  coherence  time  is  small.  The  coherence  length  is  the  distance  traveled 
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by  an  electromagnetic  wave  during  the  coherence  time: 


c  •  Ar  =  A£  =  — — 
Av 


max  *  /'min 

AA 


and  is  a  measure  of  the  length  of  the  electromagnetic  wave  packet  over  which  the 
phase  difference  is  predictable.  Recall  that  for  interference  of  waves  from  a  source 
with  wavelength  range  AA,  the  range  of  coordinate  x  over  which  fringes  are  visible  is 
half  the  period  Dmod  : 


Dmod  _  1 

1  1 

-1  1  \  X  1 

/xmax  ^min  x  1 

f  1 

2  2  •  sin  [6] 

<N 

2  •  sin  [6]  AA  2  •  sin  [6] 

\c-Au)  2  •  sin  [6] 

Thus  the  range  of  x  over  which  fringes  are  visible  is  proportional  to  the  coherence 
length. 

Lasers  are  the  best  available  coherent  sources;  to  a  very  good  approximation, 
most  lasers  emit  a  single  wavelength  A0  so  that  AA  =  0.  The  period  of  the  mod¬ 
ulating  sinusoid  Dmod  =  oo;  fringes  are  visible  at  all  a;.  A  coherent  source  should 
be  employed  when  an  optical  interference  pattern  is  used  to  measure  a  parameter  of 
the  system,  such  as  the  optical  image  quality  (as  was  used  to  test  the  Hubble  space 
telescope).  Thus  the  range  of  x  over  which  fringes  are  visible  is  determined  by  the 
coherence  length  (and  thus  the  bandwidth)  of  the  source.  Therefore,  the  visibility  of 
the  interference  fringes  may  be  used  as  a  measure  the  source  coherence. 


11.1.5  Effect  of  Polarization  of  Electric  Field  on  Fringe  Vis¬ 
ibility 

Up  to  this  point,  we  have  ignored  the  effect  on  the  orientation  of  the  electric  field 
vectors  on  the  sum  of  the  fields.  In  fact,  this  is  an  essential  consideration;  two 
orthogonal  electric  field  vectors  cannot  add  to  generate  a  time-invariant  modulation 
in  the  irradiance.  Consider  the  sum  of  two  electric  field  vectors  E,  and  E2  to  generate 
a  field  E.  The  resulting  irradiance  is: 

J  =  E  •  E  =  |E|2  =  |EX  +  E2|2 
=  (Ex  +  E2)  •  (Ex  +  E2) 

=  (fu  •  Ex)  +  (E2  •  E2)  +  (E1  •  E2)  +  (E2  •  E-J 
=  (Su  •  Ex)  +  (E2  •  E2)  +  2(EX  •  E2) 

=  I\  +  I2  +  2(E1  •  E2) 


where  I\  and  I 2  are  the  irradiances  due  to  E,  and  E2,  respectively. 

Consider  the  irradiance  in  the  case  where  the  incident  fields  are  plane  waves  trav- 
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eling  in  directions  eh  and  e2,  respectively: 

E:  =  exEi  cos  [kx  •  r  —  uit] 

E2  =  e2E2  cos  [k2  •  r  —  oj2t\ 

I  =  Ii  +  /2  4-  2  (e^i  cos  [k1  •  r  —  u\t]  ■  e2E2  cos  [k2  •  r  —  u2t]) 

=  h  +  I2  +  2£j£,2(e1  •  e2)  (cos  [k1  •  r  —  uit]  cos  [k2  •  r  —  u2t]) 

=  h  +  I2  +  2ElE2(e1  •  e2)  (cos  [(kd  -  k2)  •  r  -  (wi  -  u2)  t}) 

=  h  +  I2  +  2E1E2(e1  •  e2)  (cos  [2kmod  •  r  -  2ujmodt}) 

In  the  case  where  the  two  components  are  polarized  orthogonally  so  that  §i  •  e2  =  0, 
then  the  irradiance  is  the  sum  of  the  component  irradiances  and  no  interference  is 
seen. 

In  the  case  where  oj\  =  cu2  so  that  cumod  =  0,  and  I\  =  I2,  then  the  output 
irradiance  is: 

I  =  2h  (1  +  cos  [2kmod  •  r]) 

which  again  says  that  the  irradiance  includes  a  stationary  sinusoidal  fringe  pattern 
with  spatial  period  Amocj  =  i.2"  ,  . 

I— mod  I 

11.2  Interferometers 

We  have  seen  several  times  now  that  optical  interference  results  when  two  (or  more) 
waves  are  superposed  in  such  a  way  to  produce  a  time-stationary  spatial  modulation 
of  the  superposed  electric  field,  which  may  be  observed  by  eye  or  a  photosensitive 
detector.  Interferometers  use  this  result  to  measure  different  parameters  of  the  light 
(e.g.,  wavelength  A,  bandwidth  AA  or  coherence  length  At,  angle  and  sphericity  of  the 
wavefront,  etc .),  or  of  the  system  (path  length,  traveling  distance,  index  of  refraction, 
etc).  Interferometers  are  generally  divided  into  two  classes  that  specify  the  method  of 
separating  a  single  wavefront  into  two  (or  more)  wavefronts  that  may  be  recombined. 
The  classes  are  division  of  wavefront  and  division  of  amplitude.  The  former 
type  has  been  considered;  it  divides  wavefronts  emitted  by  the  source  into  two  pieces 
and  redirects  one  or  both  of  them  down  different  paths.  They  are  recombined  in 
a  fashion  such  that  k,  =)  k2,  even  though  |kx|  =  |k2|.  The  interference  pattern  is 
generated  from  the  kmod  portion  of  the  sum  of  the  wavefronts.  Division-of-amplitude 
interferometers  use  a  partially  reflecting  mirror  —  the  beamsplitter  —  to  divide  the 
wavefront  into  two  beams  which  travel  down  different  paths  and  are  recombined  by 
the  original  or  another  beamsplitter.  The  optical  interference  is  generated  by  the 
phase  difference  between  the  recombined  wavefronts. 


11.2.1  Division-of- Amplitude  Interferometers 

This  class  of  amplifiers  are  distinguished  from  the  just-considered  division-of- wavefront 
interferometers  by  the  presence  of  a  beamsplitter,  which  divides  the  incident  radiation 
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into  two  parts  by  partial  reflection/partial  transmission.  The  two  wavefronts  are  di¬ 
rected  down  different  paths  before  recombining  to  create  interference.  For  example, 
consider  the  Michelson  interferometer  shown  below. 


Note  that  the  beamsplitter  reflects  part  of  the  wave  and  transmits  the  rest.  From 
the  definition  of  the  amplitude  reflection  coefficient: 

r  =  — - 1  at  normal  incidence 

ni  +  n2 

we  see  that  the  amplitude  is  multiplied  by  a  negative  number  if  n i  <  n2,  meaning 
that  the  phase  is  changed  by  n  radians  if  reflected  at  a  rare-to-dense  interface  (second 
surface  has  larger  n ).  The  reflection  at  a  dense-to-rare  interface  exhibits  no  phase 
shift. 

If  the  beamsplitter  both  reflects  and  transmits  50%  of  the  irradiance  (NOT  50% 
of  the  amplitude),  then  equal  portions  of  the  energy  are  directed  toward  the  mirrors 
Mi  and  M2.  The  amplitude  of  the  electric  field  of  the  reflected  beam  is: 

El  =  Sh  =  /f  = 

=  E0  ■  =  0.707-  --Eq 

V2 

and  the  amplitude  of  the  transmitted  beam  E2  =  E\. 

Because  each  beam  is  reflected  once  and  transmitted  once  before  being  recom¬ 
bined,  the  amplitude  of  each  component  when  recombined  is: 


Each  beam  experiences  a  phase  delay  proportional  to  the  optical  distance  traveled  in 
its  arm  of  the  interferometer: 


=  —  •  d\  =  —  ■  nid\  =  k  ■  nidi 
A  A 
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where  d\  is  the  distance  traveled  by  beam  ffl  and  n \  is  the  refractive  index  in  that 
path  (n  =  1  in  vacuum  or  air).  The  beam  directed  at  mirror  M\  travels  distance  Li 
from  the  beamsplitter  to  the  mirror  and  again  on  the  return,  so  the  total  physical 
path  length  d\  =  2 L1.  Similarly  the  physical  length  of  the  second  path  is  d2  =  2 L2 
and  the  optical  path  is  n2d2  =  2 n2L2.  After  recombination,  the  relative  phase  delay 
is: 


A<f>  =  4>i  —  d>2  =  k  ( nidi  —  n2d2) 

=  2k  (n-i Li  —  n2L2) 

47T 

=  —  (niLi  -  n2L2 ) 

Note  that  the  phase  delay  is  proporational  to  A  ,  i.e. ,  longer  wavelengths  (red  light) 
experience  smaller  phase  delays  than  shorter  wavelengths  (blue  light). 

The  amplitude  at  the  detector  is  the  sum  of  the  amplitudes: 

E(t)  =  ycos 

__  Eq 
~  ~2 

which  has  the  form  cos  [A]  +  cos  [B] ,  and  thus  may  be  rewritten  as: 

E(t)  = 


'2tt 

Eq 

+  Tc°s 

'2tt 

A 

•  rii  ■  2Li  —  oj\t 

—  •  n2  ■  2 L2  -  uoit 
A 

( 

'2tt 

1  cos 

_  A 

•  n i  •  2Li  —  uj\t 

+  cos 

—  ■  n2-  2 L2  -  ojit 
A 

2  E0 


-cos 


2 

=  E0  cos 


27t  (2niLi  +  2 n2L2) 
- - - Zrcut 

•  COS 

27t  (2niLi  —  2 n2L2) 

2A 

2A 

27t  (niLi  +  n2L2 ) 
A 


—  27 rut 


■  cos 


27T  (niLi  -  n2L2 ) 
A 


If  the  indices  of  refraction  in  the  two  paths  are  equal  ( usually  ni  =  n2  =  1),  then 
the  expression  is  simplified: 


E(t)  =  E0  cos 
=  E0  cos 


27 rn  (Li  +  L2) 
A 

27t  (Li  +  L2) 

A 


—  27 rut 


27 rut 


■  cos 


cos 


27 rn  (Li  —  L2 ) 
A 

27t  (Li  —  L2) 

A 


for  n  =  1 


One  of  the  multiplicative  terms  is  a  rapidly  oscillating  function  of  time;  the  other  term 
is  stationary  in  time.  The  time  average  of  the  squared  magnitude  is  the  irradiance: 


1  =  (l^)!2}  =  Eq  ( cos2 


27t  (Li  +  L2) 


=  Eq  l  cos2 


27t  (Li  +  L2) 


A 


A 


—  27 rut 


—  27 rut 


■  cos 


27t  (Li  —  L2) 


A 


•  cos 


27t  (Li  —  L2) 
A 


2  r 


=  En  ■  -  COS 


27t  (Li  —  L2) 
A 
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The  identity  cos2  [9]  =  \  (1  +  cos  [26*])  may  be  used  to  recast  the  expression: 


r  1  , 

/  =  En  -  ■  -  1  +  COS 


’2  2 


El 


1  +  cos 


2tt 

T 


27T 

y -2  (Lx -La) 


•  2  (Li  —  T2 


Note  that  this  is  not  a  function  of  time  or  position,  but  only  of  the  lenths  of  the  arms 
of  the  interferometer  and  of  A.  The  Michelson  interferometer  with  monochromatic 
plane-wave  inputs  generates  a  uniform  irradiance  related  to  the  path  difference.  If 
input  wavefronts  with  other  shapes  are  used,  then  fringes  are  generated  whose  period 
is  a  function  of  the  local  optical  path  difference,  which  is  directly  related  to  the  shape 
of  the  wavefronts.  If  the  incident  light  is  a  tilted  plane  wave,  then  the  resulting 
pattern  is  analogous  to  that  from  a  division-of- wavefront  interferometer  (Young’s 

double-slit).  If  a  point  source  emitting  spherical  waves  is  used,  then  the  interfer¬ 
ometer  may  be  modeled  as: 


Hi 


• 

1 

1  1 

4 

* 

-2(1-!-  I*)" 

L1 

■-1 

1 

Inages  of 

Si 


Wavefronts 
fron  Si 
fron  S 


Point  Source 


Images  of  the  point  source  are  formed  at  Si  (due  to  mirror  Mi)  and  at  S2  (due  to 
M2).  The  wavefronts  superpose  and  form  interference  fringes.  The  positions  of  the 
fringes  may  be  determined  from  the  optical  path  difference  (OPD). 
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The  OPD  is  the  excess  distance  that  one  wavefront  has  to  travel  relative  to  the  other 
before  being  recombined.  For  a  ray  oriented  at  angle  9  measured  relative  to  the  axis 
of  symmetry,  the  ray  reflected  from  mirror  M2  travels  an  extra  distance: 

OPD  =  2  (Li  -  L2)  •  cos  [9] 

The  symmetry  about  the  central  axis  ensures  that  the  fringes  are  circular.  The  phase 
difference  of  the  waves  is  the  extra  number  of  radians  of  phase  that  the  wave  must 
travel  in  that  OPD: 

Ad>  =  k  ■  OPD  =  —  •  2  [Zq  -  L2]  •  cos  [9] 

A 

47t  •  (. Li  —  L2)  cos  [9] 

=  A 

If  the  phase  difference  is  a  multiple  of  2tt  radians,  then  the  waves  recombine  in 
phase  and  a  maximum  of  the  amplitude  results.  If  the  phase  difference  is  an  odd 
multiple  of  tt  radians,  then  the  waves  recombine  out  of  phase  and  a  minimum  of 
the  irradiance  results  due  to  destructive  interference.  The  locations  of  constructive 
interference  (irradiance  maxima)  may  be  specified  by: 

At  =  2  nm  =  ^  mX  =  2  (L,  -  L2)  cos  [9] 

A 

The  corresponding  angles  9  are  specified  by: 


9  =  cos  1 


mX 

2  •  (L\  —  L2)  _ 


As  the  physical  path  difference  Lx  —  L2  decreases,  then  2{l^-L2)  increases  an(l  ^ 
decreases.  In  other  words,  if  the  physical  path  difference  is  decreased,  the  angular  size 
of  a  circular  fringe  decreases  and  the  fringes  disappear  into  the  center  of  the  pattern. 
Since  a  particular  fringe  occurs  at  the  same  angle  relative  to  the  optical  axis,  these 
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are  called  fringes  of  equal  inclination. 

If  one  mirror  is  tilted  relative  to  the  other,  then  the  output  beams  travel  in  dif¬ 
ferent  directions  when  recombined.  The  fringes  thus  obtained  are  straight  and  have 
a  constant  spacing  just  like  those  from  a  Young’s  double-slit  experiment. 

11.2.2  Applications  of  the  Michelson  Interferometer 

1.  Measure  refractive  index  n:  Insert  a  plane-parallel  plate  of  known  thickness  t 
and  unknown  index  n  into  one  arm  of  a  Michelson  interferometer  illuminated 
with  light  of  wavelength  A.  Count  the  number  of  fringes  due  to  the  plate: 

OPD  =  (n  —  1)  t  =  A m  ■  A 

2.  Measure  the  wavelength  of  an  unknown  source  or  A  A  between  two  spectral  lines 
of  a  single  source.  Count  the  number  of  fringes  that  pass  a  single  point  as  one 
mirror  is  moved  a  known  distance. 

3.  Measure  lengths  of  objects:  The  standard  m  now  is  defined  in  terms  of  a  par¬ 
ticular  spectral  line: 

1  nr  =  1,  650,  763.73  wavelengths  of  emission  at  A  =  605.8  nm 
from  Kr-86  measured  in  vacuum 

This  standard  has  the  advantage  of  being  transportable. 

4.  Measure  deflections  of  objects:  The  optical  phase  difference  will  be  significant 
for  very  small  physical  path  differences.  Place  one  mirror  on  an  object  and 
count  fringes  to  measure  the  deflection. 

5.  Measure  the  velocity  of  light  (Michelson-Morley  experiment) 

11.2.3  Other  Types  of  Division-of- Amplitude  Interferome¬ 
ters 

(2)  Mach-Zehnder 

The  M-Z  interferometer  is  very  similar  to  the  Michelson,  except  that  a  second  beam¬ 
splitter  is  used  to  recombine  the  beams  so  that  the  light  does  not  traverse  the  same 
path  twice.  Therefore  there  is  no  factor  of  2  in  the  OPD  for  the  M-Z.  Mach-Zehnder 
interferometers  often  are  used  to  measure  the  refractive  index  of  liquids  or  gases.  The 
container  Cj  (or  C'2 )  is  filled  with  a  gas  while  examining  the  fringe  pattern.  As  the 
container  fills,  the  refractive  index  n  increases  and  so  does  the  optical  path  length  in 
that  arm.  The  optical  path  difference  is: 


OPD  =  {n-l)-5 
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The  fringes  move  and  each  new  cycle  of  the  pattern  corresponds  to  an  increase  in 
the  OPD  of  A.  After  m  fringes  are  counted,  the  index  of  refraction  is  found  via: 


27T 

(ni  —  1)  •  5  ■  —  =  m  ■  2n 
A 


ua  =  i  +  — 
o 


(3)  Sagnac  Interferometer 

The  Sagnac  interferometer  is  a  single-beamsplitter  version  of  the  M-Z;  the  output 
beamsplitter  is  exchanged  for  a  mirror  which  is  reversed  to  create  a  loop  path.  Light 
travels  around  the  loop  in  both  directions  so  that  the  optical  path  difference  is  zero 
for  a  stable  configuration.  However,  if  the  interferometer  (including  the  illuminator) 
is  rotated  as  on  a  turntable,  then  light  in  one  path  will  experience  a  Doppler  shift  with 
increasing  frequency  (blue-shift),  while  light  in  the  reverse  direction  will  experience 
a  red  shift.  The  phase  of  the  two  beams  will  change  in  proporation  to  the  frequency 
shift,  and  the  superposed  light  will  exhibit  a  sinusoidal  variation  in  the  detected  signal 
over  time: 

COS  [oqt]  +  COS  [u>2t]  =  2  COS  [oJavgt]  '  COS  [cUmodt] 

The  slower-varying  modulation  frequency  is  detectable  and  linearly  proportional 
to  the  rotation  rate.  This  device  may  be  used  as  a  gyroscope  with  no  moving  parts, 
and  in  fact  may  be  constructed  from  a  single  optical  fiber  that  forms  a  loop  with 
counterrotating  beams. 
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Output 


(4)  Fizeau  Interferometer 

The  Fizeau  interferometer  uses  a  single  beamsplitter  and  may  be  used  to  measure 
the  difference  in  shape  between  a  test  optical  surface  and  a  reference  surface.  In  the 
drawing,  the  physical  length  difference  between  the  path  reflected  from  of  the  bottom 
of  the  test  optic  and  from  the  top  of  the  reference  surface  is  d. 


Part  of  the  incident  beam  is  reflected  from  the  glass-air  interface  of  the  test  object. 
This  dense-to-rare  reflection  has  no  phase  shift.  The  reflection  from  the  glass  reference 
surface  is  rare-to-dense,  and  the  phase  of  the  light  is  changed  by  n  radians.  The 
two  waves  are  recombined  when  they  emerge  from  the  top  of  the  test  surface,  and 
detected.  Because  the  beams  traverse  the  same  path  in  each  direction,  the  optical 
path  difference  is  doubled,  so  an  increment  in  the  physical  path  of  |  changes  the 
optical  path  by  A  and  one  fringe  cycle  is  seen.  If  the  test  optic  is  spherical,  then  the 
physical  path  difference  d  may  be  expressed  in  terms  of  the  radius  of  curvature  R  and 
the  radial  distance  r: 
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Pythagoras  says  that: 

R2  =  (R  —  cl)2  +  r 2  =>•  r 2  =  2 Rcl  —  cl 2  =  2 Rcl  for  cl  «  R 
2 

d  =  —  for  d  «  R 
2  R 

If  the  interstice  between  the  optical  elements  in  filled  with  air,  and  if  m  fringes  are 
counted  between  two  points  at  radial  distances  rq  and  r2,  then  the  corresponding 
thickness  change  is: 

m  ■  —  =  OPD  =  nd  —>  d  =  m-  —  in  air 
2  2 


11.3  Diffraction 


In  geometrical  (ray)  optics,  light  is  assumed  to  propagate  in  straight  lines  from  the 
source  (rectilinear  propagation).  However,  Grimaldi  observed  in  the  1600s  that  this 
model  does  not  conform  to  reality  after  light  interacts  with  an  obstruction.  Grimaldi 
observed  that  light  deviates  from  straight-line  propagation  into  the  shadow  region. 
He  named  this  phenomenon  diffraction.  This  spreading  of  a  bundle  of  rays  affects  the 
sharpness  of  shadows  cast  by  opaque  objects;  the  edges  become  fuzzy  because  light 
propagates  into  the  geometrical  shadow  region. 
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Diffraction  really  is  the  same  phenomenon  as  interference.  In  both,  the  wave  character 
of  light  creates  stationary  regions  of  constructive  and  destructive  interference  that 
may  be  observed  as  bright  and  dark  regions.  In  the  simplest  case  of  two  sources  of 
infinitesmal  size,  the  superposition  wave  may  be  determined  by  summing  spherical 
wave  contributions  from  the  sources;  the  effect  is  considered  to  be  interference.  If 
the  apertures  are  large  (compared  to  the  wavelength  A),  then  the  spherical- wave 
contributions  from  a  large  number  of  subsources  are  summed  (by  integrating  over 
the  area  of  the  aperture)  to  determine  the  total  electric  field.  The  superposition 
electric  field  vector  (magnitude  and  phase)  is  the  vector  sum  of  the  fields  due  to  these 
spherical-wave  subsources.  The  mathematical  model  for  diffraction  is  straightforward 
to  develop,  though  computations  may  be  tedious. 

Recall  the  form  of  a  spherical  wave  emitted  by  a  source  located  at  the  origin 
of  coordinates;  energy  conservation  requires  that  the  energy  density  of  the  electric 
field  of  a  spherical  wave  decrease  as  the  square  of  the  distance  from  the  source. 
Correspondingly,  the  electric  field  decreases  as  the  distance  fom  the  source.  The 
electric  field  observed  at  location  [x  1 ,  y i ,  zi]  due  to  a  spherical  wave  emitted  from  the 
origin  is: 


s  [xi,yi,  zi,t]  = 


/x\  +  y\  +  z{ 


cos  [kx  ■  x  +  ky  ■  y  +  kz  ■  z  —  cut] 


This  observation  that  light  from  a  point  source  generates  a  spherical  wave  is  the  first 
step  towards  Huygen’s  principle ,  which  states  that  every  point  on  a  wavefront  may 
be  modeled  as  a  “secondary  source”  of  spherical  waves.  The  summation  of  the  waves 
from  the  secondary  sources  (sometimes  called  “wavelets” )  produces  a  new  wavefront 
that  is  “farther  downstream”  in  the  optical  path. 
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In  the  more  general  case  of  a  spherical  wave  emitted  from  a  source  located  at 
coordinates  [xo,yo,zo\  and  observed  at  [xi,yi,zf\  has  the  form: 

s  [r,  t]  =  -j-j  cos  [k0»r  —  u0t]  — >  j-j  exp  [k0«r  —  u0t] 


where  |r|  =  y/{x[  -  x0)2  +  -  y0 )2  +  (zi  -  z0 )2  and  |k0|  =  g  . 

For  large  |r|,  the  spherical  wave  may  be  approximated  accurately  as  a  paraboloidal 
wave,  and  for  VERY  large  |r|  the  sphere  becomes  a  plane  wave.  The  region  where  the 
first-  approximation  is  acceptable  defines  Fresnel  diffraction,  while  the  more  distant 
region  where  the  second  approximation  is  valid  is  the  Fraunhofer  diffraction  region. 


11.3.1  Diffraction  Integrals 


Consider  the  electric  field  emitted  from  a  point  source  located  at  [ay ,  yo ,  zq  =  0]. 
The  wave  propagates  in  all  directions.  The  electric  field  of  that  wave  is  observed  on 
an  observation  plane  centered  about  coordinate  Z\.  The  location  in  the  observation 
plane  is  described  by  the  two  coordinates  [xi,yi\.  The  electric  field  at  [xi.yi]  at  this 
distance  Zi  from  a  source  located  in  the  plane  [a;o,  yo]  centered  about  z  =  zo  =  0  is: 

E  [xi,  7/1 ;  Zx,x0,  yo,  0]  =  — ^  cos  [k0«r  -  c u0f] ,  where  |r|  =  -  xq)2  +  (yi  -  y0 )2  +  z\ 

Though  this  may  LOOK  complicated,  it  is  just  an  expression  of  the  electric  field 
propagated  as  a  spherical  wave  from  a  source  in  one  plane  to  the  observation  point 
in  another  plane;  the  amplitude  decreases  as  the  reciprocal  of  the  distance  and  the 
phase  is  proportional  to  the  distance  and  time.  Diffraction  calculations  based  on  the 
superposition  of  spherical  waves  is  Rayleigh- Sommerf eld  diffraction. 

Now,  observe  the  electric  field  at  that  same  location  [x\,y\,zi\  that  is  generated 
from  many  point  sources  located  in  the  x  —  y  plane  located  at  z  =  0.  The  summation 
of  the  fields  is  computed  as  an  integral  of  the  electric  fields  due  to  each  point  source. 
The  integral  is  over  the  area  of  the  source  plane.  If  all  sources  emit  the  same  amplitude 
Eq,  then  the  integral  is  simplified  somewhat: 


Etotai[xi,yi;zi]  = 


+oo 


E[x\,  yi,  Z\,  x0,  yo,  0]  dx0  dy0 
E0 


I aperture  \J (^l  -  X0 )2  +  {]ji  -  y0 )2  +  z\ 

'  yj (Xi  -  a’o)2  +  (yi  -  yo)2  +  -  ut 


X  cos 


dx  o  dy0 


This  integral  may  be  recast  into  a  different  form  by  defining  the  shape  of  the  aperture 
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to  be  a  2-D  function  f[x0,y0\  in  the  source  plane: 


Etotal  [x^Ui.Zi] 


=  En 


+oo 


x  cos 


f[xo,yo] 


yj ~  ^o)2  +  (yi  -  Vo)2  +  z\ 

Y~  ■  J (xi  -  x0)2  +  (2/1  -  y0)2  +  z\  -  ut 
Ao  v 


dx0dy0 


This  expression  is  the  diffraction  integral.  Again,  this  expression  LOOKS  complicated, 
but  really  represents  just  the  summation  of  the  electric  fields  due  to  all  point  sources. 
Virtually  the  entire  study  of  optical  diffraction  is  the  application  of  various  schemes  to 
simplify  and  apply  this  equation.  We  will  simplify  it  for  two  cases:  (1)  Observation 
Plane  z\  located  near  to  the  source  plane  z0  =  0;  this  is  “near-field,”  or  “Fresnel 
diffraction”  (the  s  in  “Fresnel”  is  silent  -  the  name  is  pronounced  Fre'-nel). 

(2)  Observation  Plane  Z\  located  far  enough  from  the  source  plane  located  z0  =  0 
so  that  Z\  —  00.  This  is  called  “far-field,”  or  “Fraunhofer  diffraction,”  which  is  par¬ 
ticularly  interesting  (and  easier  to  compute  results)  because  the  diffraction  integral 
is  proportional  to  the  Fourier  transform  of  the  object  distribution  (shape  of  the  aper¬ 
ture). 

A  schematic  of  the  diffraction  regions  is  shown  in  the  figure. 


Rayleigh-  Sommerfeld 
Diffraction 
(spherical  waves) 

Schematic  of  the  diffraction  regions  for  spherical  waves  emitted  by  a  point  source. 
Rayleigh- Sommerfeld  diffraction  is  based  on  the  spherical  waves  emitted  by  the 
source.  Fresnel  diffraction  is  an  approximation  based  on  the  assumption  that  the 
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wavefronts  are  parabolic  and  with  unit  amplitude  to  oo.  The  “width”  of  the  quadratic 
phase  is  indicated  by  an;  this  is  the  off-axis  distance  from  the  origin  where  the  phase 
change  is  n  radians.  Fraunhofer  diffraction  assumes  that  the  spherical  wave  has 
traveled  a  large  distance  and  the  wavefronts  may  be  approximated  by  planes. 


11.3.2  Fresnel  Diffraction 


Consider  the  first  case  of  the  diffraction  integral  where  the  observation  plane  is  near 
to  the  source  plane,  where  the  concept  of  near  must  be  defined.  Note  that  the 
distance  |r|  appears  twice  in  the  expression  for  the  electric  field  due  to  a  point  source 
-  once  in  the  denominator  and  once  in  the  phase  of  the  cosine.  The  first  term 
affects  the  size  (magnitude)  of  the  electric  field,  and  the  scalar  product  of  the  second 
with  the  wavevect-or  k  is  computed  to  determine  the  rapidly  changing  phase  angle 
of  the  sinusoid.  Because  the  phase  changes  very  quickly  with  time  (because  u  is 
very  large,  cu  =  1015  radians/second)  and  with  distance  (because  |k|  is  very  small, 
|k|  =  10  7m).  the  phase  difference  of  light  observed  at  one  point  in  the  observation 
plane  but  generated  from  two  points  in  the  source  plane  may  differ  by  MANY  radians. 
Simply  put,  small  changes  in  the  propagation  distance  |r|  have  great  significant  to 
the  computation  of  the  phase,  but  much  less  so  when  computing  the  amplitude  of 
the  electric  field.  Therefore,  the  distance  may  be  approximated  more  crudely  in  the 
denominator  than  in  the  phase. 

Now  consider  the  approximation  of  the  distance  |r|.  The  complete  expression  is: 


|r|  =  \J (aq  -  x0 )2  +  (yi  -  r/o)2  +  2 


(aq  -  Xq)2  +  (y1  -  y0 )2 


=  \l z\  •  (  1 


—  Z\  •  \  1  + 


(aq  -  x0)2  +  (y1  -  y0 )2 


=  Zi 


_  ^  +  (aq  ~  a;0)2  +  (j/i  —  Vo)2^ 


This  is  an  EXACT  expression  that  may  be  expanded  into  a  power  series  by  applying 
the  binomial  theorem.  The  general  binomial  expansion  is: 


n  ■  (n  —  1)  9  n  ■  (n  —  1)  •  (n  —  2)  , 

(1  +  a)n  =  1  +  na  + - ^ - V  + - ^ ^ - V  +  •  •  •  + 


nl 


2! 


3! 


(n  —  r)\r\ 


a 


This  series  converges  to  the  correct  value  if  a2  <  1.  For  the  case  n  =  \  (square  root), 
the  result  is: 

(l  +  a)1*  =  l  +  ---a2  +  —a3 - 

1  f  2  8  16 

which  leads  to  an  expression  for  the  distance  |r|: 
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If  z\  is  sufficiently  large,  terms  of  second  and  larger  order  may  be  assumed  to  be 
sufficiently  close  to  zero  that  they  may  be  ignored,  leaving  the  approximation: 


~  _  1-1,1  (*1  -  *o)2  +  (l/i  -  2/o)2 
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This  may  be  simplified  further  by  recasting  the  electric  field  expression  into  complex 
notation: 


Eo, 


E  [aq,  yi;  z-]_,Xo,  y0,  0]  =  —Re  exp 
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The  phase  of  this  approximation  of  the  spherical  wave  includes  a  constant  phase 
,  a  time-varying  phase  —2ttiA.  and  the  last  term  whose  phase  is  proportional  to  the 
square  of  the  distance  off-axis  from  the  source  point  from  the  observation  point.  In 
the  approximation,  the  wavefront  emitted  by  a  point  source  is  not  a  sphere,  but  rather 
a  paraboloid. 


Note  the  unreasonable  part  of  the  assumption  of  Fresnel  diffraction;  the  wavefront 
is  assumed  to  have  constant  squared  magnitude  regardless  of  the  location  [aq,yi] 
where  the  field  is  measured.  In  other  words,  the  paraboloidal  wave  in  Fresnel  diffrac¬ 
tion  has  the  same  “brightness”  regardless  of  how  far  off  axis  it  is  measured. 


For  larger  values  of  z\  (observation  plane  farther  from  the  source),  the  radius  of 
curvature  of  the  approximate  paraboloidal  waves  increases,  so  the  change  in  phase 
measured  for  nearby  points  in  the  observation  plane  decreases.  As  z\  approaches  00, 
the  paraboloid  approaches  a  plane  wave. 


This  electric  field  is  substituted  into  the  diffraction  integral  to  obtain  the  approx¬ 
imate  expression  in  the  near-field: 
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Again,  this  LOOKS  complicated,  but  really  is  just  a  collection  of  the  few  parts  that 
we  have  considered  already.  In  words,  the  integral  says  that  the  electric  field  down¬ 
stream  but  near  to  the  source  function  is  the  summation  of  paraboloidal  fields  from 
the  individual  sources.  The  paraboloidal  approximation  significantly  simplifies  the 
computation  of  the  diffracted  light. 


11.3.3  Fresnel  Diffraction  Integral  as  a  Convolution 

Consider  the  Fresnel  diffraction  integral: 


F[xi,yi\  = 


+oo 


/  [x’o,  y0]  exp 
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((xi  -x0)2  +  (yi  -yo)2) 


dx0dy0 


Define  the  exponential  to  be  a  function  h  that  depends  on  the  four  variables  in  a 
particular  way: 


h  [xi  -  x0,  yi  -  y0]  =  exp 


f  Vy  W Xl  -  x°)2  +  &  ~  ^°)2) 


In  other  words,  the  Fresnel  diffraction  integral  may  be  written  as: 


E  [ah,  yi] 


+00 


/  [xo,yo] 


h[x  i  -  x0,  yi  -  yo]  dx0  dy0 


Integral  equations  of  this  form  abound  in  all  areas  of  physical  science,  and  particularly 
in  imaging;  they  are  called  convolution  integrals.  The  function  h  is  the  shape  of  the 
integral  function  and  is  often  called  the  impulse  response  of  the  integral  operator.  In 
imaging,  and  particularly  in  optics,  the  impulse  response  often  is  called  the  point- 
spread  function.  In  other  areas  of  physics,  it  has  other  names  (e.g.,  Green’s  function). 
The  integral  operator  often  is  given  a  shorthand  notation,  such  as  the  asterisk 
The  variables  of  integration  also  often  are  renamed  as  dummy  variables,  such  as  a, f3: 


F  [x,  y] 


r»+00 


/  [cc,  fd]  h[x  —  a,y  —  fd]  da  dfd 


f  [x,  y]  *  h  [x,  y] 
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where  the  form  of  the  impulse  response  for  Fresnel  diffraction  is: 

h  [x,  y]  =  —  exp 
Zl 

This  impulse  response  is  called  a  “chirp”  function  -  the  real  and  imaginary  parts  are 
both  sinusoids  with  varying  spatial  frequency  and  that  also  differs  with  distance  z\. 
The  parameters  of  the  chirp  often  are  combined  into  \J\z\  =  a 
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so  that  the  phase  of  the  chirp  function  is  n  where  \Jx2  +  y2  =  a.  Again  note  that 
the  magnitude  of  the  impulse  response  is  the  unit  constant: 


\h[x,y}  \  =  1  [x,y\ 

which  indicates  that  the  assumed  illumination  from  a  point  source  in  the  Fresnel 
diffraction  region  is  constant  off  axis;  there  is  no  “inverse  square  law.”  This  obviously 
unphysical  assumption  limits  the  usefulness  of  calculated  diffraction  patterns  to  the 
immediate  vicinity  of  the  optical  axis  of  symmetry. 


Profiles  of  the  impulse  response  along  a  radial  axis  are  shown  for  a  =  1  and  a  =  2. 
The  source  distance  is  larger  by  a  factor  of  four  in  the  second  case. 
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1-D  profiles  of  the  impulse  response  of  Fresnel  diffraction  for  (a)  y/\z\  =  1  and  (b) 
y/\z[  =  2,  so  that  z\  is  four  times  larger  in  (b).  Note  that-  the  phase  increases  less 
rapidly  with  x  for  increasing  distances  from  the  source. 

1-D  profiles  of  the  impulse  response  of  Fresnel  diffraction  for  Xzi  =  1.  Note  that  the 
magnitude  of  the  impulse  response  is  1  and  the  phase  is  a  quadratically  increasing 
function  of  x. 

The  convolution  integral  is  straightforward  to  implement. 

Computed  Examples  of  Fresnel  Diffraction 

Below  are  computed  simulations  of  the  profiles  of  diffraction  patterns  that  would  be 
generated  from  a  knife  edge  at  the  same  distances  from  the  source  as  shown  above. 
Note  the  “ringing”  at  the  edges  and  that  the  fringes  are  farther  apart  when  observed 
farther  from  the  source.  Compare  these  images  to  actual  Fresnel  diffraction  patterns 


11.3  DIFFRACTION 


257 


in  Hecht. 


(a)  (b) 
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X 


1-D  profiles  of  the  irradiance  ( squared  magnitude)  of  diffraction  patterns  from  a 
sharp  “knife  edge”  (modeled  as  the  STEP  function  shown )  for  the  same  distances 
from  the  origin:  (a)  y/\z\  =  1;  (b)  \J\zi  =  2  =>•  z  is  four  times  larger.  Note  that 
the  “ period ”  of  the  oscillation  has  increased  with  increasing  distance  from  the  source 
and  that  the  irradiance  at  the  origin  is  not  zero  but  rather  | . 

Since  convolution  is  linear  and  shift  invariant,  the  “images”  of  rectangular  apertures 
may  be  calculated  at  these  two  distance  by  replicating  the  impulse  responses,  reversing 
one,  and  adding  the  amplitudes  before  computing  the  irradiance. 
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1-D  profiles  of  the  irradiance  ( squared  magnitude)  of  diffraction  patterns  from 
rectangle  functions  for  different  distances  from  the  origin  by  replicating  the  impulse 
responses,  reversing  one,  and  adding  the  amplitudes  before  computing  the  irradiance: 
(a)  y/\z\  =  1;  (b)  y/Xzi  =  2  =>•  z  is  four  times  larger. 


Characteristics  of  Fresnel  Diffraction 

The  parabolic  approximation  to  the  spherical  impulse  response  of  light  propagation 
produces  “images”  of  the  original  object  that  have  “fuzzy  edges”  and  oscillating 
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(a) 


(b) 


X  X 


Figure  11.2:  The  Fresnel  diffraction  patterns  of  at  the  same  distance  from  the  origin 
for  two  rectangles  with  different  widths,  showing  that  the  “width”  of  the  Fresnel  pattern 
is  proportional  to  the  width  of  the  object. 


amplitude  on  the  bright  side  of  an  edge.  At  a  fixed  distance  from  the  object,  the 
width  of  the  diffraction  pattern  is  proportional  to  the  width  of  the  original  object;  if 
the  object  becomes  wider,  so  does  the  “image”  in  the  Fresnel  diffraction  pattern. 


11.3.4  Diffraction  Integral  Valid  Far  from  Source 


The  diffraction  integral  may  be  further  simplified  for  the  case  where  the  distance 
from  the  source  to  the  observation  plane  is  sufficiently  large  to  allow  the  electric  field 
from  an  individual  source  to  be  approximated  by  a  plane  wave.  The  process  may  be 
considered  for  one  of  the  paraboloidal  waves: 
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If  the  source  is  restricted  to  be  near  to  the  optical  axis  so  that  x0,y0  =  0  (or,  more 
rigorously,  if  Xq  +  y\  «  A^i),  then 
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Similarly,  if  the  observation  point  is  near  to  the  optic  axis  so  that  x\  +  y\  <<  X z 
then: 


exp 


in(xl  +  y\) 
Xzi 
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i) 


Though  xq  and  x\  are  sufficiently  small  for  these  approximations,  the  third  exponen¬ 
tial  term  is  retained  because  their  difference  may  be  larger: 
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Considered  in  the  observation  plane  as  functions  of  x  \  and  y\ .  the  phase  of  the  wave- 
front  is  proportional  to  the  source  variables  [.ly,  yo]',  the  wavefront  is  a  plane.  The 
corresponding  approximation  for  the  diffraction  integral  is: 
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The  diffracted  light  far  from  the  source  is  a  summation  of  the  plane  waves  generated  by 
each  source  point.  This  is  called  the  Fraunhofer  diffraction  formula,  and  the  resulting 
patterns  are  VERY  different  from  Fresnel  diffraction  from  the  same  aperture.  In  fact, 
the  formula  can  be  interpreted  as  a  Fourier  transform  where  the  frequency  coordinates 
are  mapped  back  to  the  space  domain  via  £  =  rj  = 
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The  resulting  irradiance  patterns  are  the  squared  magnitudes  of  the  fields.  The 
Fourier  transform  relationship  means  that  the  diffraction  patterns  (the  “images”) 
scale  in  inverse  proportion  to  the  original  functions;  larger  input  functions  /  [x,  y] 
produce  smaller  (and  brighter)  diffraction  patterns. 


Computed  Examples  of  Fraunhofer  Diffraction 

Below  are  shown  the  profiles  of  square  apertures  and  computed  simulations  of  the 
resulting  amplitude  diffraction  pattern  in  the  Fraunhofer  diffraction  region  (the  irra¬ 
diance  is  the  squard  magnitude  of  the  plotted  amplitude).  In  both  cases,  the  object 
is  a  point  source  located  at  z\  =  oo  so  that  the  light  “fills”  both  apertures  with  zero 
phase.  For  a  fixed  (large)  distance  from  the  object,  the  “images”  of  the  diffracted 
light  get  “narrower”  and  “taller”  as  the  aperture  width  increases. 
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1-D  profiles  of  Fraunhofer  diffraction  patterns:  (a)  input  objects  are  two  rectangles 
that  differ  in  width;  ( b )  amplitude  ( NOT  irradiance)  of  the  corresponding 
Fraunhofer  diffraction  patterns,  showing  that  the  wider  aperture  produces  a 
“brighter”  and  “ narrower ”  amplitude  distributions. 


A  useful  measure  of  the  “width”  of  the  Fraunhofer  pattern  is  labeled  for  both  cases; 
this  is  the  distance  from  the  center  of  symmetry  to  the  first  zero.  This  “width”  is  a 
measure  of  the  ability  of  the  system  to  resolve  fine  detail,  because  two  point  sources 
would  each  generate  their  own  diffraction  pattern.  As  the  angular  separation  of  the 
point  sources  decreases,  it  would  be  more  difficult  to  distinguish  that  there  are  two 
overlapping  patterns. 
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Illustrations  of  resolution  in  Fraunhofer  diffraction:  (a)  individual  images  of  two 
point  sources  in  the  Fraunhofer  domain;  (b)  sum  of  the  two  images,  showing  that 
they  may  be  distinguished  easily;  (c)  images  of  two  sources  that  are  closer  together; 
(d)  sum  showing  that  it  is  much  more  difficult  to  distinguish  the  sources. 


Fraunhofer  Diffraction  in  Optical  Imaging  Systems 

Consider  a  monochromatic  point  object  located  a  long  distance  away  from  an  imaging 
system,  so  that  the  wavefronts  are  approximately  plane  waves.  The  entrance  pupil  of 
the  optical  system  (an  image  of  the  aperture  stop)  “selects”  a  portion  of  a  plane  wave. 
If  the  system  consisted  only  of  the  entrance  pupil  (which  would  then  be  identical  to 
the  aperture  stop),  then  the  light  would  continue  to  propagate.  If  observed  a  long 
distance  from  the  stop,  we  would  see  the  Fraunhofer  diffraction  pattern  of  the  stop; 
the  smaller  the  stop,  the  larger  the  diffraction  pattern.  If  the  object  consisted  of  two 
monochromatic  point  sources  displaced  by  a  small  angle,  the  diffracted  amplitude 
would  be  the  sum  of  two  replicas  of  the  Fraunhofer  diffracted  amplitude  slightly 
displaced.  The  observed  irradiance  is  the  time  average  of  the  squared  magnitude  of 
this  amplitude.  If  the  aperture  stop  is  “wide”  in  some  sense,  then  the  diffraction 
patterns  will  be  “narrow”  and  the  fact  that  the  object  consisted  of  two  points  sources 
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Figure  11.3:  Fraunhofer  diffraction  of  stop:  the  monochromatic  point  object  is  located 
a  large  distance  to  the  left  of  the  stop  and  the  diffracted  light  forms  a  Fraunhofer 
irradiance  pattern  on  the  observation  screen  a  large  distance  from  the  stop;  if  the  object 
consists  of  two  point  sources,  the  diffracted  amplitude  is  the  sum  of  the  translated 
individual  amplitudes. 


may  be  apparent.  If  the  stop  is  “narrow”  and  the  diffraction  patterns  are  “wide,”  the 
patterns  from  the  two  sources  may  overlap  and  be  difficult  to  distinguish. 


Of  course  the  optical  imaging  system  consists  of  more  than  a  stop;  it  also  includes 
lenses  and. or  mirrors  that  change  the  curvature  of  the  plane  wavefront  to  create  an 
(approximately)  spherical  wave  that  converges  to  the  real  image  point  on  the  sensor. 
We  can  interpret  the  action  of  the  optics  as  “bringing  infinity  close,”  i.e.,  it  is  bringing 
the  light  pattern  that  would  have  been  generated  at  a  large  distance  from  the  stop 
onto  the  sensor  that  is  a  short  distance  from  the  stop.  In  other  words,  the  optics 
move  the  Fraunhofer  diffraction  pattern  of  the  stop  from  its  original  location  (at  oo) 
to  the  image  plane.  The  image  of  a  point  object  is  created  by  the  imaging  system  is 
a  scaled  replica  of  the  Fraunhofer  diffraction  pattern  of  the  aperture  stop;  this  is  the 
impulse  response  of  the  optical  imaging  system  in  monochromatic  (coherent)  light.. 
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Image  Plane 

Scaled  Replica  of 
Fraunhofer  Diffraction 
Pattern  ofS 


An  optical  system  imaging  a  point  source  creates  a  scaled  replica  the  Fraunhofer 
diffraction  pattern  of  the  stop  on  the  image  plane.  In  other  words,  the  impulse 
response  of  the  imaging  system  is  the  Fraunhofer  diffraction  pattern  of  the  aperture 

stop. 


Effect  of  Diffraction  on  Image  Quality 

The  spreading  of  light  from  rectilinear  propagation  that  may  be  modeled  as  prop¬ 
agation  of  spherical  waves  that  is  called  diffraction  provides  the  fundamental  and 
ultimate  limitation  on  the  capability  of  an  optical  system  to  create  images.  Consider 
the  image  of  a  point  source  located  at  an  infinite  distance  from  a  simple  single-lens 
imaging  system.  The  source  generates  a  spherical  wave  that  is  collected  by  the  pupil 
of  the  lens.  The  lens  tries  to  change  the  curvature  to  create  a  converging  spherical 
wave  that  forms  perfect  point  image  at  the  focal  point  of  the  lens.  However,  the  prin¬ 
ciples  of  diffraction  require  that  every  point  at  the  pupil  of  the  lens  is  a  point  source 
of  spherical  waves  that  are  summed  to  create  the  image.  For  this  sum  to  create  an 
ideal  point  image,  the  electric  fields  from  all  points  in  the  pupil  would  have  to  cancel 
exactly  everywhere  except  at  the  ideal  image  point,  which  cannot  happen.  Instead, 
the  electric  fields  superpose  and  the  resulting  irradiance  creates  an  image  whose  size 
and  shape  depends  on  the  size  and  shape  of  the  pupil.  In  other  words,  diffraction  of 
light  from  the  pupil  of  the  lens  determines  the  size  of  the  image  of  the  ideal  point 
source. 

The  optical  elements  in  most  imaging  systems  have  circular  cross-sections.  The 
diffraction  pattern  of  the  circular  pupil  also  is  circularly  symmetric,  but  has  a  finite 
size  (linear  extent).  If  two  point  sources  are  sufficiently  close  together  (separated 
by  a  small  angle),  the  circularly  symmetric  images  will  overlap,  and  may  not  be 
distinguishable.  The  smallest  angular  separation  that  produces  separable  images 
determines  the  resolution  of  the  imaging  system.  Without  proof,  we  present  the 
equation  derived  by  Lord  Rayleight  for  the  resolution  of  the  imaging  system.  If  the 
sources  emit  light  at  wavelength  A,  are  located  a  distance  L  from  the  lens,  and  are 
separated  by  a  distance  d,  the  diameter  D  of  the  image  spot  is: 
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Note  the  similarity  to  the  formula  for  the  separation  between  interference  fringes  in 
Young’s  experiment.  The  angular  diameter  of  the  image  spot  is: 


Ad  = 


cl 

L 


which  implies  the  relationship: 

A9(X  D 

The  more  accurate  relation  for  imaging  elements  with  circular  cross-sections  is: 


Ad  =  1.22-^. 


For  the  200-diameter  Hale  telescope  on  Palomar  Mountain,  the  theoretical  min¬ 
imum  angular  separation  is  Ad  =  0.24A,  for  A  measured  in  meters.  In  green  light 
(A  =  550  nm),  the  angular  separation  is: 


Ad  =  1.32  •  10  7radians  =  0.132  //radians  =  0.03  arc-seconds 


Of  course,  the  ultimate  resolution  of  the  Hale  telescope  actually  is  limited  by  at¬ 
mospheric  turbulence,  which  creates  random  variations  in  the  air  temperature  and 
thus  in  the  refractive  index.  These  variations  are  often  decomposed  into  the  aberra¬ 
tions  introduced  into  the  wavefront  by  the  phase  errors.  The  constant  phase  (“pis¬ 
ton”)  error  has  no  effect  on  the  irradiance,  the  squared  magnitude  of  the  ampli¬ 
tude).  Linear  phase  errors  (“tip-tilt”)  move  the  image  from  side  to  side  and  up-down. 
Quadratic  phase  errors  (“defocus”)  act  like  additional  lenses  that  move  the  image 
plane  backwards  or  forwards  along  the  optical  axis.  In  general,  the  tip-tilt  error 
is  the  most  significant,  which  means  that  correcting  this  aberration  signficantly  im¬ 
proves  the  image  quality.  The  field  of  correcting  atmospheric  aberrations  is  called 
“adaptive  optics,”  and  is  an  active  research  area. 

The  diameter  of  the  primary  mirror  of  the  Hubbell  space  telescope  is  approxi¬ 
mately  1/2  that  of  the  Hale  telescope  (D  —  2.4m),  so  the  angular  resolution  of  the 
optics  at  550nm  is  approximately  twice  as  large  (0.26//rad  =  0.6  arc-seconds).  Of 
course,  there  is  no  atmosphere  to  mess  up  the  Hubble  images. 


Chapter  12 

Basic  Principles  of  Digital  Image 
Processing 


During  the  last  decade,  inexpensive  yet  powerful  digital  computers  have  become 
widely  available  and  have  been  applied  to  a  multitude  of  tasks.  By  hitching  com¬ 
puters  with  imaging  detectors  and  displays,  very  capable  systems  for  creating  and 
analyzing  imagery  have  been  constructed  and  are  being  applied  in  many  arenas.  For 
example,  they  now  are  used  to  reconstruct  x-ray  and  magnetic  resonance  images  in 
medicine,  to  analyze  multispectral  aerial  and  satellite  images  for  environmental  and 
military  uses,  to  read  Universal  Product  Codes  that  specify  products  and  prices  in 
retail  stores,  just  to  name  a  few. 

This  part  if  the  course  will  investigate  the  basic  principles  and  introductory  appli¬ 
cations  of  digital  imaging  systems,  and  includes  many  simple  examples  to  illustrate 
the  concepts.  Most  of  the  images  used  to  illustrate  the  concepts  are  rather  “crude”, 
consisting  of  only  4096  individual  picture  elements  (pixels)  in  a  64  x  64  array.  Each 
pixel  has  one  of  up  to  64  different  brightnesses  (gray  values  or  digital  counts)).  The 
crudeness  of  the  examples  is  intentional  because  it  allows  the  effects  due  to  process¬ 
ing  on  individual  pixels  to  be  apparent.  In  no  way  do  these  examples  represent  the 
capabilities  of  most  modern  digital  imaging  systems;  indeed,  it  is  usually  essential 
that  individual  pixels  not  be  visible  so  that  the  image  appears  to  be  a  continuously 
varying  function. 

IMAGE:  A  reproduction  or  imitation  of  form  of  a  person  or  thing. 

The  optical  counterpart  of  an  object  produced  by  a  lens,  mirror,  etc. 

. Noah  Webster 

We  normally  think  of  an  image  in  the  sense  of  a  picture,  i.e. ,  a  planar  represen¬ 
tation  of  the  brightness,  ,  i.e.,  the  amount  of  light  reflected  or  transmitted  by  an 
object. 

An  image  is  usually  a  function  of  two  spatial  variables,  e.g.,  /  [x,y\,  which  repre¬ 
sents  the  brightness  /  at  the  Cartesian  location  \x,y\.  It  may  therefore  be  graphed 
in  three  dimensions,  with  brightness  on  the  z-axis. 
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Image  Representation  of 
/  [n,  m] 


Function  of  Two  Spatial  Coordinates  /  [x,  y] 

An  image  may  have  more  than  two  coordinate  dimensions,  e.g., 


/  [x,  y,  tn] 

monochrome  “movie” 

f  [x,  y ,  A] 

color  image 

f[x,y,  A„] 

discrete  set  of  wavelengths,  multispectral  image 

/  [x,  y,  t] 

time-varying  monochrome  image 

f[x,y,z] 

3-D  image  (e.g.,  hologram) 

f  \x i  y ■)  tni  A m\ 

image  discretely  sampled  in  time  and  wavelength,  e.g.,  color  movie 

f  [x,y,z,tn,  \m\ 

reality 

Note  that  2-D  slices  can  be  “cut”  from  multidimensional  images,  and  the  resulting 
image  needn’t  be  “pictorial,”  e.g.,  consider  the  2-D  slices  “cut”  from  the  3-D 
function  f[x,y,tn\;  f  [x,y]  is  pictorial,  /  [x,tn\  is  not.  But  the  dimensionality  of  the 
axes  has  no  effect  on  the  computations;  it  is  perfectly  feasible  for  computers  to 
process  and  display  f[x,tn\  as  well  as  f  [x,y\. 


After  converting  image  information  into  an  array  of  integers,  the  image  can  be 
manipulated,  processed,  and  displayed  by  computer.  Computer  processing  is  used 
for  image  enhancement,  restoration,  segmentation,  description,  recognition,  coding, 
reconstruction,  transformation 


12.1  DIGITAL  PROCESSING 
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12.1  Digital  Processing 

The  general  digital  image  processing  system  may  be  divided  into  three  components: 
the  input  device  (or  digitizer),  the  digital  processor,  and  the  output  device  (image 
display). 


1.  The  digitizer  converts  a  continuous-tone  and  spatially  continuous  brightness 
distribution  /  [. x ,  y\  to  an  discrete  array  (the  digital  image)  fq[n,  to],  where  n,  to, 
and  fq  are  integers. 

2.  The  digital  processor  operates  on  the  digital  image  fq[n,m]  to  generate  a  new 
digital  image  gq[k,  £],  where  k,  l,  and  gq  are  integers.  The  output  image  may  be 
represented  in  a  different  coordinate  system,  hence  the  use  of  different  indices 
k  and  t. 

3.  The  image  display  converts  the  digital  output  image  gq  [ k ,  £\  back  into  a  continuous- 
tone  and  spatially  continuous  image  g  [x,  y\  for  viewing.  It  should  be  noted  that 
some  systems  may  not  require  a  display  (e.g.,  in  machine  vision  and  artificial 
intelligence  applications);  the  output  may  be  a  piece  of  information.  For  ex¬ 
ample,  a  digital  imaging  system  that  was  designed  to  answer  the  question,  Is 
there  evidence  of  a  cancerous  tumor  in  this  x-ray  image?,  ideally  would  have 
two  possible  outputs  (YES  or  NO),  ,  i.e.,  a  single  bit  of  information. 

processing.pcx 


EFIMPLER 


We  shall  first  consider  the  mathematical  description  of  image  digitizing  and  display 
devices,  and  follow  that  by  a  long  discussion  of  useful  processing  operations.  Some 
aspects  of  this  material  are  covered  in  more  depth  in  the  linear  mathematics  sequence, 
SIMG-716,717. 


Chapter  13 

Review  of  Sampling 


13.1  Digitization 

Digitization  is  the  conversion  of  a  continuous-tone  and  spatially  continuous  brightness 
distribution  /  [x,y\  to  an  discrete  array  of  integers  fq[n,m ]  by  two  operations  which 
will  be  discussed  in  turn: 

(A)  SAMPLING  -  a  function  of  continuous  coordinates  f  \x,  y\  is  evaluated  on  a 
discrete  matrix  of  samples  indexed  by  [ n,m\ . 

(B)  QUANTIZATION  -  the  continuously  varying  brightness  /  at  each  sample  is 
converted  to  a  one  of  set  of  integers  fq  by  some  nonlinear  thresholding  process. 

The  digital  image  is  a  matrix  of  picture  elements,  or  pixels  if  your  ancestors  are 
computers.  Video  descendents  (and  imaging  science  undergraduates)  often  speak  of 
pels  (often  misspelled  pelz ).  Each  matrix  element  is  an  integer  which  encodes  the 
brightness  at  that  pixel.  The  integer  value  is  called  the  gray  value  or  digital  count  of 
the  pixel. 

Computers  store  integers  as  Binary  digiTS,  or  bits  (0,1) 

2  bits  can  represent:  00a  =  0.,  01a  =  1,  10a  =  2.,  11a  =  3.;a  total  of  22  =  4 

numbers. 

(The  symbol  “a”  denotes  the  binary  analogue  to  the  decimal  point  that  is, 
the  binary  point  divides  the  ordered  bits  with  positive  and  negative  powers  of  2). 

m  BITS  can  represent  2m  numbers  ==>  8  BITS  =  1  BYTE  ==>  256  decimal 

numbers,  [0,  255] 


Note  that  digitized  image  contains  a  finite  amount  of  information:  the  number 
of  bits  required  to  store  the  data.  This  will  usually  be  less  than  the  quantity  of 
information  in  the  original  image.  In  other  words,  digitization  creates  errors.  We  will 
discuss  digitizing  and  reconstruction  error  after  describing  the  image  display  process. 
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13.2  Sampling 

This  operation  derives  a  discrete  set  of  data  points  at  (usually)  uniform  spacing.  In  its 
simplest  form,  sampling  is  expressed  mathematically  as  multiplication  of  the  original 
image  by  a  function  that  measures  the  image  brightness  at  discrete  locations: 

fs  [n  ■  Arc]  =  f  [x]  ■  s[x;n  ■  Ax] 
where: 

/  [x]  =  brightness  distribution  of  input  image 
s  [x;  n  ■  Ax]  =  sampling  function 
fs  [n  ■  Ax]  =  sampled  input  image  defined  at  coordinates  n  ■  Ax 

The  ideal  sampling  function  for  functions  of  continuous  variables  is  generated  from 
the  so-called  “Dirac  delta  function”  5  [x],  which  is  defined  by  many  authors,  including 
Gaskill.  For  the  (somewhat  less  rigorous)  purpose  here,  we  may  consider  the  sampling 
function  to  be  the  sum  of  uniformly  spaced  “discrete”  Dirac  delta  functions,  which 
Gaskill  calls  the  COMB  and  Bracewell  calls  it  the  SHAH: 

f  1  if  x  =  n  ■  Ax(n  =  0,  ±1,  ±2, . . .) 
s  [x;  n  ■  Ax]  =  < 

0  otherwise 


x 

The  COMB  function  defined  by  Gaskill  (called  the  SHAH  function  by  Bracewell). 

13.2.1  Ideal  Sampling 

Multiplication  of  the  input  /  [x]  by  a  COMB  function  merely  evaluates  /  [x]  on  the 
uniform  grid  of  points  located  at  n  ■  Ax,  where  n  is  an  integer.  Because  it  measures 
the  value  of  the  input  at  an  infmitesmal  point,  this  is  a  mathematical  idealization 
that  cannot  be  implemented  in  practice.  Even  so,  the  discussion  of  ideal  sampling 
usefully  introduces  some  essential  concepts. 

Consider  ideal  sampling  of  a  sinusoidal  input  function  with  spatial  period  Xf}  that 
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is  ideally  sampled  at  intervals  separated  by  Ax: 


f[x  = 


2nx  .1 

1  +  cos 

U  HJ 

fs  [n  ■  Ax]  = 


1 


2nx 

‘  x  ' 

1  +  cos 

+  ch 

l  v0  ^  n 

■COMB 

.Ax. 

The  amplitude  of  the  function  at  the  sample  indexed  by  n  is: 


fs  [n  ■  Ax]  =  ^(l  +  cos 


=  fs  [n  ■  Ax]  = 


27 TX 

V^ 

1 


■  5  [x  —  n  ■  Aa:] 


cos 


2nn 


Ax' 

Vo, 


+ 


The  dimensionless  parameter  is  the  ratio  of  the  sampling  interval  to  the  spatial 
period  (wavelength)  of  the  sinusoid  and  is  a  measurement  of  the  fidelity  of  the  sampled 
image.  Mathematical  expressions  for  the  sampled  function  fs  obtained  for  several 
values  of  ff-  are: 


Case  I: 
Case  II: 
Case  III: 
Case  IV: 
Case  V: 


Ax 

Vo 

Ax 

Vo 

Ax 

V^ 

Ax 

V^ 

Ax 

V^ 


1 

12’ ' 

-,(f> 

2 

-,(/> 

2 

-A 

4  ^ 

% 

4 


5  =  0  =>  f8  [n]  =  -  ■  (l  +  cos  ^ 

=  0  =>  fs  [n]  =  ^  •  (1  +  cos  M)  =  \ 
=  =►  fs  N  =  y(l  +  sin  [nn])  = 


0  =>  fa  N  =  ^  •  (  1  +  cos 


0  =>  f  s  [n]  =  -  •  (  1  +  cos 


37 rn 
him 


[i  +  (-tn 

1 

2 
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/M  = 


cos 


In— 


M 


•  • 


♦  « 


•  • 

-J*L 


M 


#• 

-*S- 


/\x 

X[«],  sampled  with  — 

XA 


1_ 

8 


/.[/?],  sampled  with 


2 


I  •••••••••••••*•• 


/[x]  translated  by  ^  cycle 

,  ,  . ,  Ax  1 

sampled  with  —  =  — 

XA  2 


=?>  no  variation  of  samples 

A.x  3 

sampled  with  7—  =  —  (incorrect  output) 
X0  4 

(=>  X0'=  3X0  sampled  4  times  per  period) 


11  • ,  Ax  5 

sampled  with  —  =  — 

X0  4 

(=>  X0'=  5X0  sampled  4  times  per  period) 


Illustration  of  sampling  of  a  biased  sinusoid,  showing  aliasing  if  the  signal  oscillates 

with  a  period  smaller  than  2  •  Ax’. 


The  output  evaluated  for  =  I  depends  on  the  phase  of  the  sinusoid;  if  sampled 
at  the  extrema,  then  the  sampled  signal  has  the  same  dynamic  range  as  /  [x]  (i.e., 
it  is  fully  modulated),  show  no  modulation,  or  any  intermediate  value.  The  interval 
Ax  =  A-  defines  the  Nyquist  sampling  limit.  If  Ae  >  1  sample  per  period,  then  the 
same  set  of  samples  could  have  been  obt5ained  from  a  sinusoid  with  a  longer  period 
and  a  different  sampling  interval  Ax.  For  example,  if  Ax  =  |,  then  the  reconstructed 
function  appears  as  though  obtained  from  a  sinudoid  with  period  X'0  =  3A0  if  sampled 
with  Ax  =  |.  In  other  words,  the  data  set  of  samples  is  ambiguous;  the  same  samples 
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could  be  obtained  from  more  than  one  input,  and  thus  we  cannot  distinguish  among 
the  possible  inputs  based  only  on  knowledge  of  the  samples. 


13.3  Aliasing  —  Whittaker- Shannon  Sampling  The¬ 
orem 

As  just  demonstrated,  the  sample  values  obtained  from  a  sinusoid  which  has  been 
sampled  fewer  than  two  times  per  period  will  be  identical  to  those  from  a  sinusoid 
with  a  longer  period.  This  ambiguity  is  called  aliasing 

in  sampling,  but  similar  effects  show  up  whenever  periodic  functions  are  multiplied 
or  added.  In  other  disciplines,  these  go  by  different  names  such  as  beats,  Moire  fringes, 
and  heterodyning.  To  illustrate,  consider  the  product  of  two  sinusoidal  functions  with 
the  different  periods  A" i and  A2 ( and  thus  spatial  frequencies  T  =  A-,£2  =  yy). 

COS  [2t1^ix]  ■  COS  [2t1^2x]  =  COS  [27t(£i  +  £2)2]  +  \  COS  [27t(£i  -  £2)2] 

The  second  term  oscillates  slowly  and  is  the  analog  of  the  aliased  signal. 

Though  the  proof  is  beyond  our  mathematical  scope  at  this  time,  we  state  that 
a  sinusoidal  signal  that  has  been  sampled  without  aliasing  can  be  perfectly  recon¬ 
structed  from  its  ideal  samples.  This  will  be  demonstrated  in  the  section  on  image 
displays.  Also  without  proof,  we  make  the  following  claim: 


Any  function  can  be  expressed  as  a  unique  sum  of  sinusoidal  components 
with  ( generally )  different  amplitudes,  frequencies,  and  phases. 


If  the  sinusoidal  representation  of  /  [a:]  has  a  component  with  a  maximum  spatial 
frequency  t;max ,  and  if  we  sample  /  [x]  so  that  this  component  is  sampled  without  alias¬ 
ing,  then  all  sinusoidal  components  of  /  [x]  will  be  adequately  sampled  and  /  [a;] can 
be  perfectly  reconstructed  from  its  samples.  Such  a  function  is  band-limited  and 
Cmax  is  the  cutoff  frequency  of  /  [a:].  The  corresponding  minimum  spatial  period  is 
Xmtn  =  — ! — .  Thus  the  sampling  interval  Aa;  can  be  found  from: 

S,max 


Ax  1 
o 


*  A  min 

Ax  <  — — 


Ax  < 


2Cn 


This  is  the  Whittaker- Shannon  sampling  theorem.  The  limiting  value  of  the  sam¬ 
pling  interval  Aa;  =  —  defines  the  Nyquist  sampling  limit.  Sampling  more  or  less 

max 

frequently  than  the  Nyquist  limit  is  oversampling  or  undersampling ,  respectively. 


Ax  > 


1 


2f,max 

1 


undersampling 


Ax  < 


,max 


oversampling 
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The  Whittaker- Shannon  Sampling  Theorem  is  valid  for  all  types  of  sampled  sig¬ 
nals.  An  increasingly  familiar  example  is  digital  recording  of  audio  signals  (e.g.,  for 
compact  discs  or  digital  audio  tape).  The  sampling  interval  is  determined  by  the 
maximum  audible  frequency  of  the  human  ear,  which  is  generally  accepted  to  be 
approximately  20kHz.  The  sampling  frequency  of  digital  audio  recorders  is  44,000 
Tccoud8 which  translates  to  a  sampling  interval  of  44  Q100  s  =  22. 7 /as.  At  this  sampling 
rate,  sounds  with  periods  greater  than  2  •  22.7 /is  =  45.4  //s  (or  frequencies  less  than 
(45.4 //s)  1  =  22  kHz)  can  theoretically  be  reconstructed  perfectly,  assuming  that 
/  [t]  is  sampled  perfectly  (i.e.,  at  a  point).  Note  that  if  the  input  signal  frequency  is 
greater  than  the  Nyquist  frequency  of  22  kHz,  the  signal  will  be  aliased  and  will  ap¬ 
pear  as  a  lower-frequency  signal  in  the  audible  range.  Thus  the  reconstructed  signal 
will  be  wrong.  This  is  prevented  by  ensuring  that  no  signals  with  frequencies  above 
the  Nyquist  limit  is  allowed  to  reach  the  sampler;  higher  frequencies  are  filtered  out 
before  sampling. 


13.4  Realistic  Sampling  —  Averaging  by  the  Detec¬ 
tor 


Signals  cannot  really  be  sampled  at  infinitesimal  points  by  multiplication  by  a  COMB; 
this  would  mean  that  the  signal  would  be  measued  by  a  detector  that  has  infinitesimal 
area;  such  a  measurement  would  have  infinitesimal  magnitude.  In  realistic  sampling, 
the  continuous  input  is  measured  at  uniformly  spaced  samples  by  using  a  detector 
with  finite  spatial  (or  temporal)  size.  The  measured  signal  is  an  average  of  the  input 
the  detector  area,  and  the  image  structure  is  blurred  by  the  averaging  process: 

Realistic  sampling  averages  the  signal  over  a  finite  area  and  blurs  information 
about  fine  structure  that  existed  in  the  original  continuous  image. 


The  discrete  samples  are  obtained  by  averaging  the  input  at  the  sample  coordi¬ 
nates.  This  is  mathematically  equivalent  to  averaging  the  continuous  input  /  [x]  with 
the  detector  weighting  function  h\  [a:]  and  sampling  the  result  by  multiplication  with 
a  COMB  function.  Spatial  averaging  may  be  expressed  as  the  integral  of  the  product 
of  the  input  function  and  the  averaging  (weighting)  function  hi  [a;],  and  is  called  a 
convolution.  The  averaging  process  is  sometimes  called  prefiltering,  or  antialiasing: 


£0]  •  hi  [; x ]  dx  =  (/  [a;]  *  hi  [a;])  \x=xo 


The  sampled  signal  is  obtained  by  multiplying  the  averaged  signal  by  the  COMB: 
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fs  [n  ■  Ax]  =  (/  [s]  *  h!  [s])  •  COMB  [£- 
where: 

/  [or]  =  brightness  distribution  of  input  image 
hi  [re]  =  antialiasing  prefilter 

fs  [n  ■  Arc]  =  sampled  input  image  defined  at  coordinates  n  ■  Ax 


realistic  sampling  is  composed  of  two  cascaded  operations: 

(1)  averaging  (convolution,  prefiltering)  over  a  detector  function,  and 

(2)  multiplication  by  an  ideal  sampling  function  COMB 


The  nature  of  the  antialiasing  prefilter  determines  the  effect  of  realistic  sampling  on 
the  output.  This  is  typically  characterized  by  measuring  the  effect  on  the  modulation 
of  a  sinusoidal  wave  f  [x]  =  \  (1  +  cos  [27t^o^])-  The  modulation  of  a  sinusoid  is 
defined  as: 

fmax  f min  <■  n  ^  , 

m  =  — - - —  for  (J  <  m  <  1 

Jmax  T  Jmin 

Note  that,  modulation  is  defined  for  nonnegative  (i.e.,  biased)  sinusoids  ONLY.  The 
analogous  quantity  for  a  nonnegative  square  wave  is  called  contrast.  For  example, 
consider  a  sinusoid  with  unit  modulation  that  is  sampled  by  an  array  with  elements 
of  width  d  spaced  at  intervals  of  width  Ax  as  shown: 


Schematic  of  sampling  of  a  biased  nonnegative  sinusoid  with  detectors  of  width  d 

spaced  at  intervals  of  Ax. 

The  signal  is  averaged  over  the  detector  area,  e.g.,  the  sampled  value  at  n  =  0  is: 


276 


CHAPTER  13  REVIEW  OF  SAMPLING 


f’[n  =  0]  =  i 


d 
'  2 
r'+OO 
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For  /  [a:]  as  defined  above,  the  set  of  samples  is  derived  by  integrating  /  [a;]  over 
the  area  of  width  d  centered  at  coordinates  that  are  integer  multiples  of  Aa;: 


1 

d 


rn-Ax+ 1  y 


1  +  COS 


1 

2d 


n  ■  Ax  + 


2nx 

dG 

d 


+  ■ 


dx  = 


-in-  Ax  — 


dx 


1 

2 


y  sin 

+  g  +  0 

—  sin 

2d  (ft) 

1 

By  defining  a  =  2nn  ■  +  <f  and  (3  =  ^,  and  by  using  the  trigonometric  identity: 

sin  [«  +  /?]—  sin  [a  —  (3\  =2  cos  a  sin  /?, 
we  find  an  expression  for  the  integral  over  the  detector  area: 


/,w  =  A^|2cos 


„  Ax 
2n n  ■  —  +  . 
Aq 


1 
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Ax 
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'  d  ' 
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A0 

_  A0_ 

where  SINC[a\  = 
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x 

Graph  of  SINC  [s]  =  ^ 


Note  that  for  constant  functions  X0  =  oo  and  SINC  1;  uniform  weighted 

averaging  has  no  effect  on  constant  inputs.  The  samples  of  cosine  of  period  X0 
obtained  with  sampling  interval  Ax  in  the  two  cases  are: 
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Realistic:  fs  [n]  =  -  •  1  4 — SINC 

J  °  J  c\  O 


2  n 
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Ideal:  fs  [n]  =  -  •  (  1  +  cos 
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Xo 
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„  Ax 
27 vn  ■  —  +  ■ 
Xq 


+  ■ 


where  d  is  the  width  of  the  detector.  The  amplitude  of  the  realistic  case  is  mul¬ 


tiplied  by  a  factor  of  SINC 


d 

X0 


,  which  is  less  than  unity  everywhere  except  at  the 


origin,  ,  i.e.,  where  d  =  0  or  X0  =  oo.  As  the  detector  size  increases  relative  to  the 


spatial  period  of  the  cosine  (  i.e.,  as  increases )  ,  then  SINC 

modulation  of  the  sinusoid  decreases. 


d 

X0 


0  and  the 


The  modulation  of  the  image  of  a  sine- wave  of  period  A0,  or  spatial  frequency 


^  =  A-,  is  reduced  by  a  factor  SINC 


d 

Xo 


=  SINC  [dfo]- 


Example  of  Reduced  Modulation  due  to  Prefiltering 


The  input  function  /  [a?]  has  a  period  of  128  units  with  two  periods  plotted.  It  is  the 
sum  of  six  sinusoidal  components  plus  a  constant: 


/  lx.  = 
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The  periods  of  the  component  sinusoids  are: 


*1 

*2 

*3 

*4 

*5 

*6 


128 

128 

IT 

128 

~5~ 

128 

T~ 

128 

IT 

128 

IT 


units 

units 

units 

units 

units 

units 


^  t  1  cycles 
128  unit 

~  42.7  units  =>•  £2 
=  25.6  units  ==>  £3 
~  18.3  units  =>•  £4 
~  14.2  units  =t-  £4 
~  11.7  units  =>•  £4 


fN J 


0.0078 


cycles 

unit 


3  cycles 


128  unit 
5  cycles 


128  unit 
7  cycles 


128  unit 
9  cycles 


128  unit 

11  cycles 


128  unit 


0.023 

0.039 

0.055 

0.070 

0.086 


cycles 

unit 

cycles 

unit 

cycles 

unit 

cycles 

unit 

cycles 

unit 


The  constant  bias  of  0.5  ensures  that  the  function  is  positive.  The  first  sinusoidal 
component  (A0i  =  128  units)  is  the  fundamental  and  carries  most  of  the  modulation 
of  the  image;  the  other  components  (the  higher  harmonics)  have  less  amplitude.  The 
spatial  frequency  of  each  component  is  much  less  than  the  Nyquist  limit  of  0.5. 
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*03 
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SINC  [d&] 
SINC  [c/&] 
SINC  [d&] 
SINC  [d£4] 
SINC  [d£5] 
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SINC 

SINC 

SINC 

SINC 

SINC 
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1 

128 

3 

128 

5 

128 

7 

128 

9 

128 

11 

128 


~  0.994 
~  0.943 
~  0.847 
~  0.714 

~  0.555 
~  0.385 


Note  that  the  modulation  of  sinusoidal  components  with  shorter  periods  (higher 
frequencies)  are  diminished  more  severely  by  the  averaging.  A  set  of  prefiltered  images 
for  several  different  averaging  widths  is  shown  on  a  following  page.  If  the  detector 
width  is  32  units,  the  resulting  modulations  are: 
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SINC  [c/6] 
SINC  [df 2] 


SINC 

SINC 


32 

32 


1 

128 

3 

128 


~  0.900 
~  0.300 


SINC  [c/6]  ~  -0.180 
SINC  [c/6]  ^  -0.129 
SINC  [c/6]  ~  -0.100 
SINC  [c/6]  ~  +0.082 


Note  that  the  components  with  periods  X04  and  X05  have  negative  modulation, 
,  i.e. ,  /max  <  /mm-  The  contrast  of  those  components  is  reversed.  As  shown,  the 
sampled  image  looks  like  a  sawtooth  with  a  period  of  128  units. 


If  the  detector  size  is  128,  each  component  is  averaged  over  an  integral  number  of 
periods  and  the  result  is  just  the  constant  bias;  the  modulation  of  the  output  is  zero: 


SINC  [dfr] 
SINC  [c/6] 
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SINC 


128' 

128 


128- 


42 


SINC  [1]  =  0 
=  SINC  [3]  = 


0 


For  a  detector  width  of  170  units,  the  modulations  are: 


SINC  [c/6] 
SINC  [c/6] 
SINC  [c/6] 

SINC  [c/6] 

SINC  [c/6] 
SINC  [c/6] 


SINC 

170 

SINC 

170 

SINC 

170 

SINC 

170 

-0.004 

+0.021 

1 

128 

3 

128 

5 

128 

7 

128 


~  —0.206 
~  —0.004 
~  +0.043 
~  -0.028 


Because  the  first  (largest  amplitude)  sinusoidal  component  has  negative  modula¬ 
tion,  so  does  the  resulting  image.  The  overall  image  contrast  is  reversed;  darker  areas 
of  the  input  become  brighter  in  the  image. 
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Illustration  of  the  reduction  in  modulation  due  to  “prefiltering”:  (a)  input  function 
f  [n];  (b)  result  of  prefiltering  with  uniform  averagers  of  width  d  =  0,  d  =  and 
d  =  (c)  magnified  view  of  (b),  showing  the  change  in  the  signal;  (d)  result 

offiltering  with  uniform  averagers  of  width  d=  fifi,  d  =  X0,  and  d  =  showing 

the  “contrast  reversal ”  in  the  last  case. 


Chapter  14 

Review  of  Quantization 


14.1  Tone- Transfer  Curve 


The  second  operation  of  the  digitization  process  converts  the  continuously  valued 
irradiance  of  each  sample  at  the  detector  (i.e.,  the  brightness)  to  an  integer,  i.e., 
the  sampled  image  is  quantized.  The  entire  process  of  measuring  and  quantizing 
the  brightnesses  is  significantly  affected  by  detector  characteristics  such  as  dynamic 
range  and  linearity.  The  dynamic  range  of  a  detector  image  is  the  range  of  brightness 
(irradiance)  over  which  a  change  in  the  input  signal  produces  a  detectable  change  in 
the  output.  The  input  and  output  quantities  need  not  be  identical;  the  input  may 
be  measured  in  and  the  output  in  optical  density.  The  effect  of  the  detector 
on  the  measurement  may  be  described  by  a  transfer  characteristic  or  tone-transfer 
curve  (TTC),  i.e.,  a  plot  of  the  output  vs.  input  for  the  detector.  The  shape  of  the 
transfer  characteristic  may  be  used  as  a  figure  of  merit  for  the  measurement  process. 
A  detector  is  linear  if  the  TTC  is  a  straight  line,  i.e.,  if  an  incremental  change  in 
input  from  any  level  produces  a  fixed  incremental  change  in  the  output.  Of  course, 
all  real  detectors  have  a  limited  dynamic  range,  i.e.,  they  will  not  respond  at  all 
to  light  intensity  below  some  minimum  value  and  their  response  will  not  change  for 
intensities  above  some  maximum.  All  realistic  detectors  are  therefore  nonlinear,  but 
there  may  be  some  regions  over  which  they  are  more-or-less  linear,  with  nonlinear 
regions  at  either  end.  A  common  such  example  is  photographic  film;  the  TTC  is  the 
H-D  curve  which  plots  recorded  optical  density  of  the  emulsion  vs.  the  logarithm  of 
the  input  irradiance  [-^].  Another  very  important  example  in  digital  imaging  is  the 
video  camera,  whose  TTC  maps  input  light  intensity  to  output  voltage.  The  transfer 
characteristic  of  a  video  camera  is  approximately  a  power  law: 

Vout  =  Ci  Bjn  +  Vo 

where  Vo  is  the  threshold  voltage  for  a  dark  input  and  7  (gamma)  is  the  exponent  of 
the  power  law.  The  value  of  7  depends  on  the  specific  detector:  typical  values  are 
7  =  1.7  for  a  vidicon  camera  and  7  =  1  for  an  image  orthicon. 
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Nonlinear  tone-transfer  curve  of  quantizer,  showing  a  linear  region. 


14.2  Quantization 

Quantization  converts  continuously  valued  measured  irradiance  at  a  sample  to  a  mem¬ 
ber  of  a  discrete  set  of  gray  levels  or  digital  counts,  e.g.,the  sample  f[x,y\  e.g., 
/  [0,0]  =  1.234567890-  ■  ■  ^2,  is  converted  to  an  integer  between  0  and  some  max¬ 
imum  value  (e.g.,  255)  by  an  analog-t.o-digital  conversion  (A/D  converter  or  ADC). 
The  number  of  levels  is  determined  by  number  of  bits  available  for  quantization  in  the 
ADC.  A  quantizer  with  m  bits  defines  M  =  2m  levels.  The  most  common  quantizers 
have  m  =  8  bits  (one  byte);  such  systems  can  specify  256  different  gray  levels  (usually 
numbered  from  [0,  255] ,  where  0  is  usually  assigned  to  “black”  and  255  to  “white” . 
Images  digitized  to  12  or  even  16  bits  are  becoming  more  common,  and  have  4096 
and  65536  levels,  respectively. 

The  resolution,  or  step  size  b,  of  the  quantizer  is  the  difference  in  brightness 
between  adjacent  gray  levels.  It  makes  little  sense  to  quantize  with  a  resolution  b 
which  is  less  than  the  uncertainty  in  gray  level  due  to  noise  in  the  detector  system. 
Thus  the  effective  number  of  levels  is  often  less  than  the  maximum  possible. 

Conversion  from  a  continuous  range  to  discrete  levels  requires  a  thresholding  op¬ 
eration  (e.g., truncation  or  rounding).  Some  range  of  input  brightnesses  will  map  to 
a  single  output  level,  e.g.,  all  measured  irradiances  between  0.76  and  0.77^/t  might 
map  to  gray  level  59.  Threshold  conversion  is  a  nonlinear  operation,  i.e.,  the  thresh¬ 
old  of  a  sum  of  two  inputs  is  not  necessarily  the  sum  of  the  thresholded  outputs.  The 
concept  of  linear  operators  will  be  discussed  extensively  later,  but  we  should  say  at 
this  point  that  the  nonlinearity  due  to  quantization  makes  it  inappropriate  to  analyze 
the  complete  digital  imaging  system  (digitizer,  processor,  and  display)  by  common 
linear  methods.  This  problem  is  usually  ignored,  as  is  appropriate  for  large  numbers 
of  quantized  levels  that  are  closely  spaced  so  that  the  digitized  image  appears  con¬ 
tinuous.  Because  the  brightness  resolution  of  the  eye-brain  is  limited,  quantizing  to 
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only  50  levels  is  satisfactory  for  many  images;  in  other  words,  Gbits  of  data  is  often 
sufficient  for  images  to  be  viewed  by  humans. 

The  quantization  operation  is  performed  by  digital  comparators  or  sample-and- 
hold  circuits.  The  simplest  quantizer  converts  an  analog  input  voltage  to  a  1-bit 
digital  output  and  can  be  constructed  from  an  ideal  differential  amplifier,  where  the 
output  voltage  Vout  is  proportional  to  the  difference  of  two  voltages  Vin  and  Vref 

V0ut  =  C^iy'in  We/) 

Vref  is  a  reference  voltage  provided  by  a  known  source.  If  a  is  large  enought  to 
approximate  oo,  then  the  output  voltage  will  be  +oo  if  Vin  >  Vref  and  —  oo  if  Vin  < 
Vref-  We  assign  the  digital  value  “1”  to  a  positive  output  and  “0”  to  a  negative 
output.  A  quantizer  with  better  resolution  can  be  constructed  by  cascading  several 
such  digital  comparators  with  equally  spaced  reference  voltages.  A  digital  translator 
converts  the  comparator  signals  to  the  binary  code.  A  2-bit  ADC  is  shown  in  the 
figure: 


Comparator  and  2-Bit  ADC.  The  comparator  is  a  “thresholder;”  its  output  is  “high” 
if  Vin  >  Vref  and  “low”  otherwise.  The  ADC  consists  of  4  comparators  whose 
reference  voltages  are  set  at  different  values  by  the  resistor-ladder  voltage  divider. 
The  translator  converts  the  4  thresholded  levels  to  a  binary-coded  signal. 


In  most  systems,  the  step  size  between  adjacent  quantized  levels  is  fixed  (“uniform 
quantization”): 

7  /max  ./'min 

0  =  - 

2m  -  1 

where  fmax  and  /mm  are  the  extrema  of  the  measured  irradiances  of  the  image  samples 
and  m  is  the  number  of  bits  of  the  quantizer. 

If  the  darkest  and  brightest  samples  of  a  continuous-tone  image  have  measured 
irradiances  frntn  and  frnax  respectively,  and  the  image  is  to  be  quantized  using  m  bits 
(2m  graylevels),  then  we  may  define  a  set  of  uniformly  spaced  levels  fq that  span  the 
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dynamic  range  via: 


fq  [x,  y\  =  Q 


f  [x,  y]  -  fn 


=  Q 


f  [x,  y\  -  fn 

/max  /mi: 


•  2r 


where  Q  {  }  represents  the  nonlinear  truncation  or  rounding  operation,  e.g.,  Q  {3.657}  = 
3  if  Q  is  truncation  or  4  if  Q  is  rounding.  The  form  of  Q  determines  the  location  of 
the  decision  levels  where  the  quantizer  jumps  from  one  level  to  the  next.  The  image 
irradiances  are  reconstructed  by  assigning  all  pixels  with  a  particular  gray  level  fq  to 
the  same  irradiance  value  E[x,y\,  which  might  be  defined  by  “inverting”  the  quan¬ 
tization  relation.  The  reconstruction  level  is  often  placed  between  the  decision  levels 
by  adding  a  factor 


E  [x,  y]  =  (/„  [x,  y]  ■  _f“‘°)  +  1 

Usually  (of  course),  E[x,y\  E[x,y\  due  to  the  quantization,  i.e.,  there  will  be 
quantization  error.  The  goal  of  optimum  quantization  is  to  adjust  the  quantization 
scheme  to  reconstruct  the  set  of  image  irradiances  which  most  closely  approximates 
the  ensemble  of  original  values.  The  criterion  which  defines  the  goodness  of  fit  and  the 
statistics  of  the  original  irradiances  will  determine  the  parameters  of  the  quantizer, 
e.g.,  the  set  of  thresholds  between  the  levels. 

The  quantizer  just  described  is  memoryless,  i.e.,  the  quantization  level  for  a  pixel 
is  computed  independently  that  for  any  other  pixel.  The  schematic  of  a  memoryless 
quantizer  is  shown  below.  As  will  be  discussed,  a  quantizer  with  memory  may  have 
significant  advantages. 


14.3  Quantization  Error  (“Noise”) 

The  gray  value  of  the  quantized  image  is  an  integer  value  which  is  related  to  the 
input  irradiance  at  that  sample.  For  uniform  quantization,  where  the  steps  between 
adjacent  levels  are  the  same  size,  the  constant  of  proportionality  is  the  difference  in 
irradiance  between  adjacent  quantized  levels.  The  difference  between  the  true  input 
irradiance  (or  brightness)  and  the  corresponding  irradiance  of  the  digital  level  is  the 
quantization  error  at  that  pixel: 

e  [n  ■  Ax,  m  ■  Ay]  =  f  [n  ■  Ax,  m  ■  Ay]  —  fq  [n  ■  Ax,  m  ■  Ay] . 

Note  that  the  quantization  error  is  bipolar  in  general,  i.e.,  it  may  take  on  positive 
or  negative  values.  It  often  is  useful  to  describe  the  statistical  properties  of  the 
quantization  error,  which  will  be  a  function  of  both  the  type  of  quantizer  and  the 
input  image.  However,  if  the  difference  between  quantization  steps  (i.e.,  the  width 
of  a  quantization  level)  is  b,  is  constant,  the  quantization  error  for  most  images  may 
be  approximated  as  a  uniform  distribution  with  mean  value  (e  [n] )  =  0  and  variance 
((ei  [n])2)  =  j2-  The  error  distribution  will  be  demonstrated  for  two  1-D  256-sample 
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images.  The  first  is  a  section  of  a  cosine  sampled  at  256  points  and  quantized  to  64 
levels  separated  by  b  =  1: 


Illustration  of  the  statistics  of  quantization  noise:  (a)  f  [n]  =  63  cos  [2^^]  for 
0  <  n  <  255;  (b)  after  quantization  by  rounding  to  nearest  integer;  ( c )  quantization 
error  £  [n]  =  /  [n]  —  fq  [n\,  showing  that  <  e  <  +|;  (d)  histogram  of  256  samples 
of  quantization  error,  showing  that  the  statistics  are  approximately  uniform. 


The  histogram  of  the  error  e\  [n]  =  fi  [n]  —  Q{f\  [n]}  is  approximately  uniform  over 
the  interval  —  \  <  e±  <  +|.  The  computed  statistics  of  the  error  are  (ei  [n])  = 
— 5.1  •  10~4  =  0  and  variance  is  (ef  [n])  =  0.08  = 

The  second  image  is  comprised  of  256  samples  of  Gaussian-distributed  random 
noise  in  the  interval  [0,  63]  that  again  is  quantized  to  64  levels.  The  histogram  of  the 
error  62  [n]  again  is  approximately  uniformly  distributed  in  the  interval  [—0.5,  +0.5] 
with  mean  4.09  •  10^2  =  0  and  variance  a2  =  (e|  [n])  =  0.09  = 
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Illustration  of  the  statistics  of  quantization  noise:  (a)  f  [n]  is  Gaussian  noise  with 
measured  /i  =  27.7,  a  =  10.9  for  0  <  n  <  255;  (b)  after  quantization  by  rounding  to 
nearest  integer;  ( c )  quantization  error  £  [n]  =  /  [n]  —  fq  [n],  showing  that 
—  \  <£<  +|;  (d)  histogram  of  256  samples  of  quantization  error,  showing  that  the 
statistics  are  STILL  approximately  uniform. 


The  total  quantization  error  is  the  sum  of  the  quantization  error  over  all  pixels  in 
the  image: 

-EE  e  [n  ■  Ax,  m  ■  Ay] . 
i  j 

An  image  with  large  bipolar  error  values  thus  may  have  a  small  total  error.  The 
mean-squared  error  (average  of  the  squared  error)  is  a  better  descriptor  of  the  fidelity 
of  the  quantization: 

e2  =  ^  (g2  [n  '  Aa;’ m  '  A?d])  i 

i  j 
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where  N  is  the  number  pixels  in  the  image.  If  the  irradiance  is  measured  in  ^2, 

e2  will  have  units  of  (^a)2.  The  root- mean- squared  (RMS)  error  has  the  same 
dimensions  as  the  error: 

RMS  Error  =  x/e2  = 

It  should  be  obvious  that  the  RMS  error  for  one  image  is  a  function  of  the  quantizer 
used,  and  that  the  RMS  error  from  one  quantizer  will  differ  for  different  images.  It 
should  also  be  obvious  that  it  is  desirable  to  minimize  the  RMS  error  in  an  image. 
The  brute-force  method  for  minimizing  quantization  error  is  to  add  more  bits  to  the 
ADC,  which  increases  the  cost  of  the  quantizer  and  the  memory  required  to  store  the 
image. 


We  now  extend  the  discussion  to  consider  the  concepts  of  signal  bandwidth  and 
digital  data  rate,  which  in  turn  require  an  understanding  of  signal-to-noise  ratio 
(SNR)  and  its  relationship  to  quantization.  Recall  that  the  variance  a2  of  a  signal  is 
a  measure  of  the  spread  of  its  amplitude  about  the  mean  value. 


[f  M  -  (f[x])fdx 


[ f  M  -  (/  M>]2  dx 


The  signal-to-noise  power  ratio  of  an  analog  signal  is  most  rigorously  defined  as  the 
dimensionless  ratio  of  the  variances  of  the  signal  and  noise: 


Thus  a  large  SNR  means  that  there  is  a  larger  variation  of  the  signal  amplitude  than 
of  the  noise  amplitude.  This  definition  of  SNR  as  the  ratio  of  variances  may  vary 
over  a  large  range  -  easily  several  orders  of  magnitude  -  so  that  the  numerical  values 
may  become  unwieldy.  The  range  of  SNR  may  be  compressed  by  expressing  it  on  a 
logarithmic  scale  with  dimensionless  units  of  bels : 


SNR  =  log10 
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This  definition  of  SNR  is  even  more  commonly  expressed  in  units  of  tenths  of  a  bel 
so  that  the  integer  value  is  more  precise.  The  resulting  metric  is  in  terms  of  decibels: 


SNR  =  10  log10 
=  20  log10 


cn 


/ y 2 
L  un. 


Tf 

&n. 


=  10  log10 
[decibels] 


0^71. 


Under  this  definition, SNR  =  10  dB  if  the  signal  variance  is  ten  times  larger  than  the 
noise  variance  and  20  dB  if  the  standard  deviation  is  ten  times  larger  than  that  of 
the  noise. 


The  variances  obviously  depend  on  the  statistics  (the  histograms)  of  the  signal 
and  noise.  The  variances  depend  only  on  the  range  of  gray  values  and  not  on  their 
“arrangement”  (i.e.,  numerical  “order”  or  “pictorial”  appearance  in  the  image.  Since 
the  noise  often  is  determined  by  the  measurement  equipment,  a  single  measurement 
of  the  noise  variance  often  is  used  for  many  signal  amplitudes.  However,  the  signal 
variance  must  be  measured  each  time.  Consider  the  variances  of  some  common  1-D 
signals. 


14.3.1  Example:  Variance  of  a  Sinusoid 


The  variance  of  a  sinusoid  with  amplitude  A0  is  easily  computed  by  direct  integration: 


/  [a;]  =  A0  cos 


„  x 

2N, 

VO 
2 


=  y-  f  (/  M  -  (/  M»2  dx  =  ns 


yo 

2 


0 


X, 


o 


A0  cos 


^  x 

27rT 


dx 


Ao  [+^L  1  , 

=  xiL ¥  2ll  +  cos 


,  X 
47T  — 


dx  -  2^  (V0  +  0) 


a 


9  A? 

j  =  ~y  for  sinusoid  with  amplitude  A0 


Note  that  the  variance  does  not  depend  on  the  period  (i.e.,  on  the  spatial  frequency) 
or  on  the  initial  phase  ^  it  is  a  function  of  the  histogram  of  the  values  in  a  period 
and  not  of  the  “ordered”  values.  It  also  does  not  depend  on  any  “bias”  (additive 
constant)  in  the  signal.  The  standard  deviation  of  the  sinusoid  is  just  the  square  root 
of  the  variance: 


— for  sinusoid  with  amplitude  A0 
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14.3.2  Example:  Variance  of  a  Square  Wave: 


The  variance  of  a  square  wave  with  the  same  amplitude  also  is  easily  evaluated  by 
integration  of  the  thresholded  sinusoid: 


/  [*]  =  A0  SGN 

O  1 


^  x  1 

cos 

2n-- 

l 

af=X~ 

Ao 

1 

'~Xo 


Y  [f[x\^(f[A)?dx  =  Y- 

X0 


A2—  4-  A2— 
A°  2  +  0  2 


-  A2 


a 


-Ao\  dx 


^  4 

X0 

4 


[+A0]2  dx 


a2  =  Aq  for  square  wave  with  amplitude  A0 

af  =  A0  for  square  wave  with  amplitude  A0 

Note  that  the  variance  of  the  square  wave  is  larger  than  that  of  the  sine  wave  with 
the  same  amplitude: 

<7 f  for  square  wave  with  amplitude  A0  >  af  for  sinusoid  with  amplitude  A0 


which  makes  intuitive  sense,  because  the  amplitude  of  the  square  wave  is  more  often 
“distant”  from  its  mean  than  the  sinusoid  is. 


14.3.3  Variance  of  “Noise”  from  a  Gaussian  Distribution 


A  set  of  amplitudes  selected  at  random  from  a  Gaussain  probability  distribution  is 
called  (conveniently  enough)  “Gaussian  noise.”  The  most  common  definition  of  the 
statistical  distribution  is: 


p  [n 


1 


exp 


2a2 


This  probability  distribution  function  has  unit  area,  as  required.  The  Gaussian  dis¬ 
tribution  is  specified  by  the  two  parameters  /i,  the  mean  value  of  the  distribution, 
and  rr2,  its  variance.  The  standard  deviation  a  is  a  measure  of  the  “width”  of  the 
distribution  and  so  influences  the  range  of  output  amplitudes. 
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Histogram  of  Gaussian  Noise  with  p.  =  4,  a  =  1 


Histogram  of  8192  samples  taken  from  the  Gaussian  distribution 


p  m  = 


exp 


-(¥)' 


14.3.4  Approximations  to  SNR 

Since  the  variance  depends  on  the  statistics  of  the  signal,  it  is  common  (though  less 
rigorous)  to  approximate  the  variance  by  the  square  of  the  dynamic  range,  which  is 
the  “peak-to-peak  signal  amplitude”  /max  — /min  =  A /.  In  most  cases,  (A/)2  is  larger 
(and  often  much  larger)  than  a2.  In  the  examples  of  the  sinusoid  and  the  square  wave 
already  considered,  the  approximations  are: 

A2 

Sinusoid  with  amplitude  A0  ==>  a2  =  -A,  (A/)2  =  (2A0)2  =  4Aq  =  8  a2 

A 

Square  wave  with  amplitude  A0  ==>  a2  =  Aq,  (A/)2  =  (2A0)2  =  4Aq  =  4  a2 

For  the  example  of  Gaussian  noise  with  variance  a2  =  1  and  mean  //,  the  dynamic 
range  A /  of  the  noise  technically  is  infinite,  but  its  extrema  often  be  approximated 
based  on  the  observation  that  few  amplitudes  exist  outside  of  four  standard  deviations, 
so  that  /max  =  /j+4ct,  /min  =  /i— 4cr,  leading  to  A /  =  8a.  The  estimate  of  the  variance 
of  the  signal  is  then  (A/)2  =  64a2,  which  is  (obviously)  64  times  larger  than  the  actual 
variance.  Because  this  estimate  of  the  signal  variance  is  too  large,  the  estimates  of 
the  SNR  thus  obtained  will  be  too  optimistic. 

Often,  the  signal  and  noise  of  images  are  measured  by  photoelectric  detectors  as 
differences  in  electrical  potential  in  volts;  the  signal  dynamic  range  is  Vf  =  VnYrlx  —  V'mn, , 
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the  average  noise  voltage  is  Vn.  and  the  signal-to-noise  ratio  is: 

SNR  =  10  log10  (ht  =  20  log10  (^1  [dB\ 

As  an  aside,  we  mention  that  the  signal  amplitude  (or  level )  of  analog  electrical  signals 
often  is  described  in  terms  of  dB  measured  relative  to  some  fixed  reference.  If  the 
reference  level  is  1  Volt,  the  signal  level  is  measured  in  units  of  dBV: 

level  =  10  log10  (Vj?)  dBV  =  20  log10  ( Vf )  dBV 

The  level  is  measured  relative  to  1  mV  is  in  units  of  dBm : 

level  =  10  log10  (1Q^3y2)  dBV  =  10  loglo  (y^j  dBm 


14.3.5  SNR  of  Quantization 


We  can  use  these  definitions  to  evaluate  the  signal-to-noise  ratio  of  the  quantization 
process.  Though  the  input  signal  and  the  type  of  quantizer  determine  the  probability 
density  function  of  the  quantization  error  in  a  strict  sense,  the  quantization  error 
for  the  two  examples  of  quantized  sinusoidal  and  Gaussian-distributed  signals  both 
exhibited  quantization  errors  that  were  approximately  uniformly  distributed.  We  will 
continue  this  assumption  that  the  probability  density  function  is  a  rectangle.  In  the 
case  of  an  m-bit  uniform  quantizer  (2m  gray  levels)  where  the  levels  are  spaced  by 
intervals  of  width  b  over  the  full  analog  dynamic  range  of  the  signal,  the  error  due 
to  quantization  will  be  (approximately)  uniformly  distributed  over  this  interval  b. 
If  the  nonlinearity  of  the  quantizer  is  rounding ,  the  mean  value  of  the  error  is  0;  if 
truncation  to  the  next  lower  integer,  the  mean  value  is  —  |.  It  is  quite  easy  to  evaluate 
the  variance  of  uniformly  distributed  noise: 


For  an  m-bit  quantizer  and  a  signal  with  with  maximum  and  minimum  amplitudes 
/max  and  /min,  the  width  of  a  quantization  level  is: 


b 


/max  /n 


2m 


A/ 


and  by  assuming  that  the  quantization  noise  is  uniformly  distributed,  the  variance  of 
the  quantization  noise  is: 


b2  (A/)2 

12  12  •  (2 mf 


(A/)2  •  (12  •  22m)  1 
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The  resulting  SNR  is  the  ratio  of  the  variance  of  the  signal  to  that  of  the  quantization 
noise: 


<r2f  „ 

SNR  =  -4  =  a2f 
ai  J 


12  •  22m 

(A/)2 


which,  when  expressed  on  a  logarithm  scale,  becomes: 

SNR  =  10  log10  [a2  •  12  •  22m]  -  10  log10  [(A/)2] 

=  10  log10  [a2f\  +  10  log10  [12]  +  20m  log10  [2]  -  101og10  [(A/)2] 
“  10  log10  [af]  +  10  •  1.079  +  20m  •  0.301  -  101og10  [(A/)2] 


=  6.02  m  +  10.8  +  10  log10 


< n 


LV(a/)' 


m 


The  third  term  obviously  depends  on  both  the  signal  and  the  quantizer.  This  equation 
certainly  demonstrates  that  the  SNR  of  quantization  increases  by  ^  6  dB  for  every 
bit  added  to  the  quantizer.  If  using  the  (poor)  estimate  that  a2  =  (A/)2,  then  the 
third  term  evaluates  to  zero  and  the  approximate  SNR  is: 


SNR  for  quantization  to  m  bits  =  6.02  m  +  10.8  +  10  log10  [1])  =  6.02  m  +  10.8  [clB\ 


The  statistics  of  the  signal  (and  thus  its  variance  a2)  may  be  approximated  for 
many  types  of  signals  ( e.g .,  music,  speech,  realistic  images)  as  resulting  from  a  random 
process.  The  histograms  of  these  signals  usually  are  peaked  at  or  near  the  mean  value 
/i  and  the  probability  of  a  gray  level  decreases  for  values  away  from  the  mean;  the 
signal  approximately  is  the  output  of  a  Gaussian  random  process  with  variance  a2. 
By  selecting  the  dynamic  range  of  the  quantizer  A /  to  be  sufficiently  larger  than 
Uf.  few  (if  any)  levels  should  be  saturated  at  and  clipped  by  the  quantizer.  As 
already  stated,  we  assume  that  virtually  no  values  are  clipped  if  the  the  maximum 
and  minimum  levels  of  the  quantizer  are  four  standard  deviations  from  the  mean 
level: 


Sf  j  min  fn 


A/  A 

ISf  =  ~2~  =  4  af 


In  other  words,  we  may  choose  the  step  size  between  levels  of  the  quantizer  to  satisfy 
the  criterion: 


A/ 


8  af 


(A  fy 


1 

64 


The  SNR  of  the  quantization  process  becomes: 


SNR 


6.02  m  +  10.8  +  10 
6.02  m  +  10.8  +  10 


log 


10 


1 


64 
-1.806) 


6.02  to  -  7.26  [dB] 


which  is  18  dB  less  than  the  estimate  obtained  by  assuming  that  a2  = 


(A/)2.  This 
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again  demonstrates  that  the  original  estimate  of  SNR  was  optimistic. 

This  expression  for  the  SNR  of  quantizing  a  Gaussian-distributed  random  signal 
with  measured  variance  a2  may  be  demonstrated  by  quantizing  that  signal  to  m  bits 
over  the  range  /m*n  =  /r  —  4a/  to  fmax  =  /i  +  4a /,  and  computing  the  variance  of  the 
quantization  error  a2.  The  resulting  SNR  should  satisfy  the  relation: 


SNR 


10  log10 


(6.02  m  -  7.26)  dB 


The  SNR  of  a  noise-free  analog  signal  after  quantizing  to  8  bits  is  SNR8  =  41  dB ;  if 
quantized  to  16  bits  (common  in  CD  players),  SNR16  =  89  dB.  The  best  SNR  that 
can  be  obtained  from  analog  recording  (such  as  on  magnetic  tape)  is  about  65  dB, 
which  is  equivalent  to  that  from  a  signal  digitized  to  12  bits  per  sample  or  4096  gray 
levels. 

The  flip  side  of  this  problem  is  to  determine  the  effective  number  of  quantization 
bits  after  digitizing  a  noisy  analog  signal.  This  problem  was  investigated  by  Shannon 
in  1948.  The  analog  signal  is  partly  characterized  by  its  bandwidth  Az/  [Hz],  which 
is  the  analog  analogue  of  the  concept  of  digital  data  rate  [bits  per  second].  The 
bandwidth  is  the  width  of  the  region  of  support  of  the  signal  spectrum  (its  Fourier 
transform) . 

When  sampling  and  quantizing  a  noisy  analog  signal,  the  bit  rate  is  determined  by 
the  signal-to-noise  ratio  of  the  analog  signal.  According  to  Shannon,  the  bandwidth 
Az/  of  a  transmission  channel  is  related  to  the  maximum  digital  data  rate  Rmax  and 
the  dimensionless  signal-to-noise  power  ratio  SNR  via: 


R 


max 


(2  •  Az/)  log2  [1  +  SNR] 


where  Shannon  defined  the  SNR  to  be  the  ratio  of  the  peak  signal  power  to  the  average 
white  noise  power.  It  is  very  important  to  note  that  the  SNR  in  this  equation  is  a 
dimensionless  ratio;  it  is  NOT  compressed  via  a  logarithm  and  is  not  measured  in 
dB.  The  factor  of  2  is  needed  to  account  for  the  negative  frequencies  in  the  signal. 
The  quantity  log2  [1  +  SNR]  is  the  number  of  effective  quantization  bits,  and  may 
be  seen  intuitively  in  the  following  way:  if  the  total  dynamic  range  of  the  signal 
amplitude  is  S,  the  dynamic  range  of  the  signal  power  is  S2.  If  the  variance  of  the 
noise  power  is  a2,  then  the  effective  number  of  quantization  transitions  is  the  power 
SNR,  or  A-.  The  number  of  quantization  levels  is  1  I-  SNR,,  and  the  effective  number 
of  quantization  bits  is  log2  [1  +  SNR]. 


14.4  Quantizers  with  Memory  —  Error  Diffusion 

Another  way  to  change  the  quantization  error  is  to  use  a  quantizer  with  memory, 
which  means  that  the  quantized  value  at  a  pixel  is  determined  in  part  by  the  quan¬ 
tization  error  at  nearby  pixels.  A  schematic  diagram  of  the  quantizer  with  memory 
is: 
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f [x, y] 


Flow  chart  for  quantizer  with  memory 


A  simple  method  for  quantizing  with  memory  that  generally  results  in  reduced  total 
error  without  a  priori  knowledge  of  the  statistics  of  the  input  image  and  without 
adding  much  additional  complexity  of  computation  was  introduced  by  Floyd  and 
Steinberg  (Proc.  SID,  17,  pp. 75-77,  1975)  as  a  means  to  simulate  gray  level  im¬ 
ages  on  binary  image  displays  and  is  known  as  error  diffusion.  It  is  easily  adapted 
to  multilevel  image  quantization.  As  indicated  by  the  name,  in  error  diffusion  the 
quantization  error  is  from  one  pixel  is  used  to  in  the  computation  of  the  levels  of 
succeeding  pixels.  In  its  simplest  form,  all  quantization  error  at  one  pixel  is  added  to 
the  gray  level  of  the  next  pixel  before  quantization.  In  the  1-D  case,  the  quantization 
level  at  sample  location  x  is  the  gray  level  of  the  sample  minus  the  error  e[x  —  1]  at 
the  preceding  pixel: 


fq  M  =  Q  {/  M  -  e  [x  -  1]} 

e  [a;]  =  /  [a;]  -  fq  [s] 

=  /  M  -  Q  {/  M  ~e[x-  1]} 

In  the  2-D  case,  the  error  may  be  weighted  and  propagated  in  different  directions. 
A  discussion  of  the  use  of  error  diffusion  in  ADC  was  given  by  Anastassiou  (IEEE 
Trans.  Circuits  and  Systems,  36,  1175,  1989). 


The  examples  on  the  following  pages  demonstrate  the  effects  of  binary  quantization 
on  gray-level  images.  The  images  of  the  ramp  demonstrate  that  why  the  binarizer  with 
memory  is  often  called  pulse-density  modulation.  Note  that  the  error-diffused  images 
convey  more  information  about  fine  detail  than  the  images  from  the  memoryless 
quantizer.  This  is  accomplished  by  possibly  enhancing  the  local  binarization  error. 
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2-D  error- diffused  quantization  for  three  different  grayscale  images:  (a)  linear  ramp 
image,  after  quantizing  at  the  midgray  level,  and  after  Floyd- Steinberg  error 
diffusion  at  the  midgray  level;  (b)  same  sequence  for  “Lincoln”;  (c)  same  sequence 
for  “Liberty.  ”  The  error- diffused  images  convey  more  information  about  the  larger 

spatial  frequencies 


14.5  Image  Display  Systems  —  Digital  -  to  -  Analog 
Conversion 

A  complete  image  processing  system  must  regenerate  a  viewable  signal  from  the  quan¬ 
tized  samples.  This  requires  that  the  digital  signal  be  converted  back  to  a  continuously 
varying  brightness  distribution;  analog  estimates  of  the  samples  of  the  original  signal 
are  derived  by  a  digital-to-analog  converter  (DAC)  and  the  brightness  is  spread  over 
the  viewing  area  by  the  interpolation  of  the  display.  Each  of  these  processes  will  be 
discussed  in  turn,  beginning  with  the  DAC. 

The  principle  of  the  DAC  is  very  intuitive;  each  bit  of  the  digital  signal  represents 
a  piece  of  the  desired  output  voltage  that  is  generated  by  a  voltage  divider  ladder 
network  and  a  summing  amplifer.  For  example,  if  a  4-bit  digital  signal  is  represented 
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by  the  binary  word  ABCD,  the  desired  output  voltage  is: 

Vout  =  V(8  A  +  4B  +  2C  +  D) 

where  V  is  the  desired  voltage  for  a  signal  represented  by  the  binary  word  0001.  The 
appropriate  DAC  signal  is  shown  below: 


Binary  Ward  (ABCD) 


R 


Digital-to-analog  converter  circuit  for  4-bit  binary  input  with  bit  values  ABCD.  The 
circuit  generates  an  analog  output  voltage  V  =  D  +  2 C  +  4 B  +  8A 

Variations  of  the  circuit  shown  are  more  practical  for  long  binary  words,  but  the 
principle  remains  the  same.  Note  that  the  output  voltage  is  analog,  but  it  is  still 
quantized,  i.e.,  only  a  finite  set  of  output  voltages  is  possible  (ignoring  any  noise). 


14.6  Image  Interpolation 

The  image  display  generates  a  continuously  varying  function  g  [. x ,  y]  from  the  processed 
image  samples  gq[n,  m\.  This  is  accomplished  by  defining  an  interpolator  that  is  placed 
at  each  sample  with  the  same  amplitude  as  the  sample.  The  continuously  varying  re¬ 
constructed  image  is  the  sum  of  the  scaled  interpolation  functions.  This  is  analogous 
to  the  connect-the-dots  puzzle  for  children  to  fill  in  the  contours  of  a  picture.  Math¬ 
ematically,  interpolation  may  be  expressed  as  a  convolution  of  the  output  sampled 
image  with  an  interpolation  function  (the  postfilter)  /i2 .  In  1-D: 

OO 

9  M  =  ^2  9q  [n  ■  Arc]  •  h2  [x  -  n  •  Arc]  =  gq  [rc]  *  h2  [a] 

n=— oo 

In  an  image  display,  the  form  of  the  interpolation  function  is  determined  by  the 
hardware  and  may  have  very  significant  effects  on  the  character  of  the  displayed 
image.  For  common  cathode-ray  tubes  (CRTs  -  the  television  tube),  the  interpolation 
function  is  approximately  a  gaussian  function,  but  is  often  further  approximated  by 
a  circle  (or  cylinder)  function. 

The  effect  of  the  interpolator  on  the  output  is  illustrated  by  a  few  simple  examples. 
In  the  1-D  case,  the  input  is  a  sinusoid  with  period  X(J  =  64  sampled  at  intervals 
Arc  =  8.  The  interpolators  are  arect  function  (nearest-neighbor  interpolator),  triangle 
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function  (linear  interpolator),  cubic  b-spline,  and  a  Gaussian.  Examples  for  2-D 
images  are  shown  on  following  pages. 


14.6.1  Ideal  Interpolation 


In  the  discussion  of  the  Whittaker- Shannon  sampling  theorem,  we  have  stated  that 
an  unaliased  function  can  be  perfectly  reconstructed  from  its  unaliased  ideal  samples. 
Actually,  as  stated  the  theorem  is  true  but  a  bit  misleading.  To  be  clearer,  we  could 
say  the  following: 

Any  function  can  be  perfectly  reconstructed  from  an  infinite  number  of  unaliased 
samples,  i.e.,  samples  obtained  at  a  rate  greater  than  two  times  per  period  of  the 
highest  frequency  component  in  the  original  function. 

In  reality,  of  course,  we  always  have  a  finite  number  of  samples,  and  thus  we  cannot 
perfectly  reconstruct  an  arbitrary  function.  Periodic  functions  may  be  reconstructed, 
however,  because  the  samples  of  a  single  period  will  be  sufficient  to  recover  the  entire 
function. 

In  the  example  just  presented,  the  ideal  interpolation  function  must  be  something 
other  than  a  rectangle  or  gaussian  function.  We  will  again  assert  without  proof  that 
the  ideal  interpolator  for  samples  separated  by  a  distance  Ax  is: 


h2  [x]  =  SINC 


■  x  ' 
.Ax. 


Note  that  the  SINC  function  has  infinite  support  and  is  bipolar;  thus  it  is  not 
obvious  how  to  implement  such  a  display.  However,  we  can  illustrate  the  result  by 
using  the  example  of  the  sampled  cosine  already  considered.  Note  that  the  cosine  is 
periodic. 


(a) 


X 


X 


Ideal  interpolation  of  the  function  f  [x]  =  cos  [27tx]  sampled  with  Ax  =  A  unit.  The 
weighted  Dirac  delta  functions  at  each  sample  are  replaced  by  weighted  SINC 
functions  (three  shown,  for  n  =  0,  —1,  —  3),  which  are  summed  to  reconstruct  the 

original  cosine  function. 
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14.6.2  Modulation  Transfer  Function  of  Sampling 

We  have  just  demonstrated  that  images  may  be  perfectly  reconstructed  from  una¬ 
liased  and  unquantized  ideal  samples  obtained  at  intervals  Ax  by  interpolating  with 
SINC  Of  course,  reconstructed  images  obtained  from  a  finite  number  of  sam¬ 

ples  systems  obtained  from  a  system  with  averaging  and  quantization  will  not  be 
perfect.  We  now  digress  to  illustrate  a  common  metric  for  imaging  system  quality 
by  applying  it  to  realistically  sampled  systems.  Though  it  is  not  strictly  appropriate, 
the  illustration  is  still  instructive. 

Averaging  by  the  detector  ensures  that  the  modulation  of  a  reconstructed  sinu¬ 
soid  g  [x]  will  generally  be  less  than  that  of  the  continuous  input  function  /  [x] ,  i.e., 
image  modulation  is  imperfectly  transferred  from  the  input  to  the  reconstructed  out¬ 
put.  The  transfer  of  modulation  can  be  quantified  for  sinusoids  of  each  frequency; 
because  the  averaging  effect  of  the  digitizer  is  fixed,  higher-frequency  sinusoids  will 
be  more  affected  than  lower  frequencies.  A  plot  of  the  modulation  transfer  vs.  spatial 
frequency  is  the  modulation  transfer  function  or  MTF.  Note  that  MTF  describes  a 
characteristic  of  the  system,  not  the  input  or  output. 

For  ideal  sampling  (and  ideal  reconstruction)  at  all  frequencies  less  than  Nyquist, 
the  input  function  f  [x]  is  perfectly  reconstructed  from  the  sample  values  fs  [n  ■  Ax] , 
and  therefore  the  modulation  transfer  function  is  unity  for  spatial  frequencies  less 
than  |  cycle  per  pixel. 


Sinusoids  with  frequencies  £  >  the  Nyquist  frequency  are  aliased  by  ideal  sampling. 
The  “new”  frequency  is  less  than  the  Nyquist  frequency. 

Because  the  output  frequency  is  different  from  the  input  frequency, 
it  is  not  sensible  to  talk  about  the  transfer  of  modulation  for  frequencies  above  Nyquist. 


Modulation  Transfer 


Schematic  of  the  modulation  transfer  function  of  the  cascade  of  ideal  sampling  and 
ideal  interpolation;  the  MTF  is  unit  at  all  spatial  frequencies  out  to  the  Nyquist 

frequency. 
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14.6.3  MTF  of  Realistic  Sampling  (Finite  Detectors) 


We  have  already  demonstrated  that  the  modulation  due  to  uniform  averaging  depends 
on  the  detector  width  cl  and  the  spatial  frequency  £  of  the  function  as  S I N  C  (df) .  If 
the  detector  size  is  half  the  sampling  interval  (d  =  -^),  the  MTF  is: 


SINC 


=  SINC 
_  sin  [f  ] 


7T 

4 


=  SINC 


=  0.9  at  the  Nyquist  frequency. 


i.e.,  can  still  be  reconstructed  perfectly  by  appropriately  amplifying  the  attenuated 
sinusoidal  components,  a  process  known  as  inverse  filtering  that  will  be  considered 
later.  In  the  common  case  of  detector  size  equal  to  sampling  interval  (d  =  Ax),  the 
minimum  MTF  is  SINC  [0.5]  =  0.637  at  the  Nyquist  frequency. 


SAMPLING  ARRAS’ 


SAMPLING  ARRAS’  d  =  Ax 


5  in  units  of  (Ax)'1 

MTF  of  sampling  for  d  =  and  cl  =  Ax. 


By  scanning,  we  can  sample  the  input  sequentially,  and  it  is  thus  possible  to  a 
detector  size  larger  than  the  sampling  interval.  If  d  =  2  •  Ax,  then  the  detector 
integrates  over  a  full  period  of  a  sinusoid  at  the  Nyquist  frequency;  the  averaged 
signal  at  this  frequency  is  constant  (usually  zero,  i.e.,  no  modulation). 

For  larger  scanned  detectors,  the  modulation  can  invert,  i.e.,  the  contrast  of  sinu¬ 
soids  over  a  range  of  frequencies  can  actually  reverse.  This  has  already  been  shown 
for  the  case  =  1.5  ==>  cl  =  3  •  Ax  at  the  Nyquist  rate. 
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\  in  units  of  (Ax)’1 


MTF  of  scanning  systems  with  cl  =  2  •  Ax  and  d  =  3  •  Ax,  showing  that  the 
MTF  =  0  at  one  frequency  and  is  negative  for  larger  spatial  frequencies  approaching 
the  Nyquist  frequency  in  the  second  case.  This  leads  to  a  phase  shift  of  the 

reconstructed  sinusoids. 


If  the  inputs  are  square  waves,  the  analogous  figure  of  merit  is  the  contrast  transfer 
function  or  CTF. 


14.7  Effect  of  Phase  Reversal  on  Image  Quality 


To  illustrate  the  effect  on  the  image  of  contrast  reversal  due  to  detector  size,  consider 
the  examples  shown  below. 

The  input  was  imaged  with  two  different  systems:  the  MTF  of  the  first  system 
reversed  the  phase  of  sinusoids  with  higher  frequencies,  while  the  second  did  not. 
Note  the  sharper  edges  of  the  letters  in  the  second  image: 
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%  in  units  of  (Ax)'1 
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-32 


-32  0  +31 


+31 


0 


-32 


-32  0  +31 


with  phase  reversal  without  phase  reversal 

Effect  of  phase  reversal  on  image  quality.  The  edges  are  arguably  “sharper”  with  the 

phase  reversal. 


14.8  Summary  of  Effects  of  Sampling  and  Quanti¬ 
zation 

ideal  sampling  =>-  aliasing  if  undersampled 

realistic  sampling  ==>  aliasing  if  undersampled  =>  modulation  reduced  at  all 
nonzero  spatial  frequencies 


quantization  =>•  error  is  inherent  in  the  nonlinear  operation 
morebits,  less  noise  =>  less  error 
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14.9  Spatial  Resolution 

Photographic  resolution  is  typically  measured  by  some  figure  of  merit  like  c^s  or 
line  pairs  per  mm,  which  are  the  maximum  visible  spatial  frequency  of  a  recorded 
sine  wave  or  square  wave,  respectively.  Visibility  is  typically  defined  by  a  specific 
value  of  the  emulsion’s  modulation  transfer  function  (MTF,  for  sinusoids)  or  contrast 
transfer  function  (CTF,  for  square  waves).  The  specific  point  of  the  modulation  curve 
that  is  used  as  the  resolution  criterion  may  be  different  in  different  applications.  For 
example,  the  resolution  of  imagery  in  highly  critical  applications  might  be  measured 
as  the  spatial  frequency  where  the  modulation  transfer  is  0.9,  while  the  frequency 
where  the  MTF  is  0  may  be  used  for  noncritical  applications.  The  spatial  resolution 
of  digital  images  may  be  measured  in  similar  fashion  from  the  MTF  curve  due  to 
sampling,  which  we  have  just  determined  to  be  a  function  of  the  sampling  interval 
Ax  and  the  detector  width  cl.  The  maximum  frequency  that  can  be  reconstructed 
is  the  Nyquist  limit  £max  =  and  the  modulation  at  spatial  frequency  £  varies 
with  the  detector  size  as  SINC  [c/£].  In  remote  sensing,  it  is  common  to  use  the 
instantaneous  field  of  view  (IFOV)  and  ground  instantaneous  field  of  view  (GIFOV). 
The  IFOV  is  the  full-angle  subtended  by  the  detector  size  cl  at  the  entrance  pupil  of 
the  optical  system.  The  term  GIFOV  is  inappropriate  for  the  definition;  spot  size 
would  be  better.  The  GIFOV  of  a  digital  imaging  system  is  the  spatial  size  of  the 
detector  projected  onto  the  object,  e.g.,the  GIFOV  of  the  French  SPOT  satellite  is 
10m. 


Chapter  15 

Point  Operations 


Once  the  image  data  has  been  sampled,  quantized,  and  stored  in  the  computer,  the 
next  task  is  processing  to  improve  the  image,  i.e. ,  to  extract  some  (or  more)  infor¬ 
mation  from  the  data.  The  various  image  processing  operations  0{  }  are  applied  to 
the  digital  input  image  fs[n  ■  Ax,  m  ■  Ay]  to  obtain  a  (generally)  different  output 
gs[n'  ■  Ax,  m!  ■  Ay],  From  this  point  on,  all  images  may  be  considered  as  sampled  data 
and  thus  the  subscript  s  will  be  ignored  and  the  coordinates  will  be  labeled  by  [x,  y\. 
The  general  operator  has  the  form 

°{f  [T2/]}  =  9  [x',yr] 

The  various  operators  O  can  be  grouped  based  on  the  number  and  location  of  pixels 
of  the  input  image  /  that  affect  the  computation  of  a  particular  output  pixel  g  [ag  y] . 
One  possible  set  of  categories  is: 

1.  Point  Operations  on  single  images:  The  gray  value  of  the  output  image  g 
at  a  particular  pixel  [x,  y\  depends  ONLY  on  the  gray  value  of  the  same  pixel  in 
/;  examples  of  these  operations  include  contrast  stretching,  segmentation  based 
on  gray  value,  and  histogram  equalization; 

2.  Point  Operations  on  multiple  images:  The  gray  value  of  the  output  pixel 
g  [ag  y]  depends  on  the  gray  values  of  the  same  pixel  in  a  set  of  input  images 
/  [x,  y,  tn\  or  /  [x,  y,  A,J;  examples  are  segmentation  based  on  variations  in  time 
or  color;  multiple-frame  averaging  for  noise  smoothing,  change  detection,  and 
spatial  detector  normalization; 

3.  Neighborhood  Operations  on  one  image:  The  gray  value  of  g  at  a  partic¬ 
ular  pixel  [ag  y\  depends  on  the  gray  values  of  pixels  in  the  neighborhood  of  of 
the  same  pixel  in  /  [ag y]\  examples  include  convolution  (as  for  image  smooth¬ 
ing  or  sharpening),  and  spatial  feature  detection  (e.g.,  line,  edge,  and  corner- 
detection); 

4.  Neighborhood  Operations  on  multiple  images:  This  is  just  a  general¬ 
ization  of  (3);  the  pixel  g  [x,y]  depends  on  pixels  in  the  spatial  and  temporal 
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(or  spectral)  neighborhood  of  [x,y,tn  or  An] .  spatial  /  temporal  convolution  or 
spatial  /  spectral  convolution 

5.  Operations  based  on  Object  “Shape”  (e.g.,  “structural”  or  “morphologi¬ 
cal”)  operations:  The  gray  level  of  the  output  pixel  is  determined  by  the  object 
class  to  which  a  pixel  belongs;  examples  include  classification,  segmentation, 
data  compression,  character  recognition; 

6.  Geometrical  Operations:  The  pixels  /  [a;,  y]  are  remapped  to  a  new  coordi¬ 
nate  system  to  obtain  g  \x,  y\ :  image  warping,  cartography; 

7.  “Global”  Operations:  The  gray  value  of  the  output  image  at  a  pixerl  de¬ 
pends  on  the  gray  values  of  all  of  the  pixels  of  /  \x,  y\ ;  these  include  image 
transformations,  e.g.,  Fourier,  Hartley,  Hough,  Haar,  Radon  transforms 


15.1  Point  Operations  on  Single  Images 

The  gray  value  of  each  pixel  in  the  output  image  g  [x,  y]  depends  on  the  gray  value 
of  only  the  corresponding  pixel  of  the  input  image  /  [rc,  y\ .  Every  pixel  of  /  [x,  y] 
with  the  same  gray  level  maps  to  a  single  (usually  different)  gray  value  in  the  output 
image. 


Schematic  of  a  point  operation  on  a  single  image:  the  gray  value  of  the  output  pixel 
is  determined  ONLY  by  the  gray  value  of  the  corresponding  input  pixel. 

In  a  point  operation,  the  only  available  parameter  that  determines  the  output 
pixel  is  the  gray  value  of  that  one  input  pixel.  Therefore,  the  point  operation  must 
affect  all  pixels  with  the  same  input  gray  level  /0  in  the  same  fashion;  they  all  change 
to  the  same  output  gray  value  g0.  When  designing  the  action  of  the  point  operation, 
it  often  is  very  useful  to  know  the  pixel  population  H  as  a  function  of  gray  level  /: 
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HISTOGRAM  OF  f Ix,y] 


the  histogram  H  [/]  of  the  image.  The  amplitude  of  the  histogram  at  gray  value  /  is 
proportional  to  the  probability  of  occurrence  of  that  gray  value. 


15.1.1  Image  Histograms 

The  histogram  of  the  2-D  image  /  [a;,  y\  plots  the  population  of  pixels  with  each  gray 
level  /.  The  histogram  generally  is  represented  as  a  1-D  function  H  [f]  where  the 
independent  variable  is  the  gray  value  /  and  the  dependent  variable  is  the  number  of 
pixels  H  with  that  level.  The  histogram  depicts  a  particular  feature  of  the  image:  the 
population  of  gray  levels.  It  arguably  represents  the  simplest  feature  of  the  image, 
i.e.,  a  measure  of  a  useful  characteristic  of  the  image.  The  histogram  may  be  called 
a  feature  space  and  its  properties  may  be  used  to  segment  the  image  pixels  into 
component  groups. 

The  histogram  often  contains  valuable  global  information  about  the  image.  For 
example,  most  pixels  in  a  low-contrast  image  are  contained  in  a  narrow  range  of  gray 
levels,  so  the  histogram  is  concentrated  within  that  small  interval.  An  image  with  a 
bimodal  histogram  (the  histogram  of  gray  values  exhibits  two  modes  a  histogram 
with  two  “peaks”)  often  consists  of  a  foreground  object  (whose  pixels  are  concentrated 
around  a  single  average  gray  value)  on  top  of  a  background  object  whose  pixels  have 
a  different  average  gray  value. 

Because  all  pixels  in  the  image  must  have  some  gray  value  in  the  allowed  range, 
the  sum  of  populations  of  the  histogram  bins  must  equal  the  total  number  of  image 
pixels  N: 

jfmax 

£ff[/]=Jv 

/= o 

where  /max  is  the  maximum  gray  value  (/max  =  255  for  an  8-bit  quantizer).  The 
histogram  function  is  a  scaled  replica  of  the  probability  distribution  function  of  gray 
levels  in  that  image.  The  discrete  probability  distribution  function  p  [/]  must  satisfy 
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the  constraint: 


^P[f]  =  1 


/= o 


and  therefore  the  probability  distribution  and  the  histogram  are  related  by  the  simple 
expression: 

As  will  be  described  later  during  the  discussion  of  image  compression  and  information 
theory,  the  probability  distribution  leads  to  a  measure  of  the  “quantity”  of  information 
in  the  image,  which  is  the  minimum  number  of  bits  of  data  that  is  required  to  store 
the  image  and  generally  is  measured  in  bits  per  pixel.  If  the  probability  of  gray 
level  /  in  the  image  f  [x,y\  is  represented  as  p  [/],  the  definition  of  the  quantity  of 
information  in  the  image  is: 


J[f\  =  lo§2  {.P  [f  ] ) 

/= 0 

From  this  definition,  it  is  easy  to  show  that  the  maximum  information  content  is 
obtained  if  each  gray  level  has  the  same  probability;  in  other  words,  a  flat  histogram 
corresponds  to  maximum  information  content. 


bits 

pixel 


15.1.2  Histograms  of  Typical  Images 

The  form  of  the  image  histogram  often  indicates  the  character  of  the  original  image: 
a  bitonal  or  binary  image  will  have  only  two  gray  levels  occupied;  an  image  composed 
of  a  small  dark  object  on  a  large  bright  background  will  have  a  bimodal  histogram; 
a  low-contrast  image  will  have  a  small  number  of  contiguous  levels  occupied;  and  an 
image  with  a  large  information  content  will  have  a  flat  histogram. 


H(f)  H  (-F  ) 


GRflV  LEUEL 


GRflV  LEUEL 
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H  (-F) 

T" FL AT “  HISTOGRAM  OF 
HIGH  CONTRAST  IMAGE 


0  GRAV  LEUEL 


H(f) 


15.1.3  Cumalative  Histogram 


Given  an  iV-pixel  image  f  [x,y\  having  gray  values  in  the  range  0  <  /  <  /max  and 
histogram  H  [/] ,  then  the  cumulative  histogram  evaluated  at  gray  value  f0  is  the 
number  of  pixels  with  gray  value  less  than  or  equal  to  f0: 


C[/0]  =  f>[/]  =  lX>[/] 

/= 0  /= 0 

The  value  of  the  cumulative  histogram  at  the  maximum  gray  value  is  the  number  of 
pixels  in  the  image: 


/max 

C[/„»]  =  £#[/]=  JV 

/= 0 

In  the  case  of  a  continuous  probability  distribution,  the  cumulative  histogram  is  an 
integral  over  gray  level: 


c  [/o]  =  [  H  [/]  df 

J/.o 

The  cumulative  histogram  is  used  to  derive  the  mapping  that  maximizes  the  global 
visibility  of  changes  in  image  gray  value  ( histogram  equalization )  and  for  deriving  an 
output  image  with  a  specific  histogram  ( histogram  specification ). 
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1  2  3  4  5  6  7  8 


If  H  [/]  is  flat,  so  that  every  gray  level  “has”  the  same  number  of  pixels,  then 
the  associated  cumulative  histogram  C  [f]  increases  by  the  same  number  of  pixels  for 
each  gray  value,  and  thus  forms  a  linear  “ramp”  function  at  45°.  The  cumulative 
histogram  of  a  low-contrast  image  rises  rapidly  in  the  gray  levels  where  most  pixels 
lie  and  slowly  over  the  other  levels. 

CUMULATIUE  HISTOGRAM  OF  CUMULATIUE  HISTOGRAM  OF 


0  255  0  255 

IMAGE  GRAV  LEUEL  IMAGE  GRAV  LEUEL 


15.1.4  Histogram  Modification  for  Image  Enhancement 

In  point  processing,  the  only  parameter  available  in  the  pixel  transformation  is  the 
gray  value  of  that  pixel;  all  pixels  of  the  same  gray  level  must  be  transformed  iden¬ 
tically  by  a  point  process.  The  mapping  from  input  gray  level  /  to  output  level  g  is 
called  a  lookup  table,  or  LUT.  Lookup  tables  can  be  graphically  plotted  as  transfor¬ 
mations  g  [ f }  that  relate  the  input  gray  level  (plotted  on  the  x-axis)  to  the  output 
gray  level  (on  the  y-axis) .  One  such  operation  is  the  “spreading  out”  of  the  compact 


15.1  POINT  OPERATIONS  ON  SINGLE  IMAGES 


309 


histogram  from  a  low-contrast  image  over  the  full  available  dynamic  range  to  make 
the  image  information  more  visible. 

Examples  of  Point  Operations 

The  output  resulting  from  the  first  mapping  below  is  identical  to  the  input,  while  the 
output  derived  from  the  second  mapping  has  inverted  contrast,  i.e. ,  white— >black. 


0  f  255 


First  row:  identity  lookup  table  g  [f]  =  f,  the  resulting  image  g  [x,y\  =  f  [x,y]  and 
its  histogram.  Second  row:  the  “negative”  lookup  table  g  [/]  =  255  —  f,  the  resulting 
image,  and  its  histogram,  showing  that  the  histogram  is  “reversed”  by  the  lookup 

table. 


0  f  255 
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First  row:  lookup  table  that  decreases  the  contrast,  the  resulting  image,  and  the 
histogram  showing  the  concentration  of  pixels  in  the  middle  third  of  the  range. 
Second  row:  lookup  table  for  linear  contrast  enhancement,  its  result  when  applied  to 
the  low-contrast  image,  and  the  “spread-outv  histogram. 


Remapping  Histograms 

As  already  mentioned,  the  image  histogram  is  proportional  to  the  probability  dis¬ 
tribution  of  gray  levels  in  the  image.  The  action  of  any  lookup  table  on  an  image 
may  be  modeled  as  a  transformation  of  probabilities.  Recall  that  the  area  under  any 
continuous  probability  density  p  [/]  or  discrete  probability  distribution  pf  is  unity: 


P  [/]  df  =  1 


n= 0 


1 


For  histograms,  the  corresponding  equations  are: 


fm&x. 

H  [/]  df  =  Y,H  1/1 


/= o 


N  (total  number  of  pixels), 


which  merely  states  that  every  image  pixel  has  some  gray  level  between  0  and  /max. 
Similarly  for  the  output  image: 


■£H{g)  =  N. 

9= 0 

The  input  and  output  histograms  H  [/]  and  H  [g]  are  related  to  the  lookup  table 
transformation  g  [/]  via  the  basic  principles  of  probability  theory.  The  fact  that  the 
number  of  pixels  must  be  conserved  requires  that  incremental  areas  under  the  two 
histograms  must  match,  i.e. ,  if  input  gray  level  /0  becomes  output  level  go,  then: 

H  [/o]  df  =  H  [g0]  dg  (continuous  gray  levels) 

H  [/0]  =  H  [r/0]  (discrete  gray  levels) 

These  equations  merely  state  that  all  input  pixels  with  level  /0  are  mapped  to  level 
go  in  the  output. 

15.1.5  Jones  Plots 

It  may  be  useful  to  plot  the  histogram  of  the  input  image,  the  lookup  table,  and  the 
histogram  of  the  output  image  on  the  same  Jones  plot  that  shows  the  relationship 
among  them.  The  input  histogram  is  upside  down  at  the  lower-right;  the  output 
histogram  (rotated  90°  counterclockwise)  is  at  the  upper  left;  the  lookup  table  is 
at  the  upper  right.  The  new  gray  level  go  is  determined  by  mapping  the  value  /o 
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through  the  curve  g  [/]  onto  the  vertical  axis.  In  this  case,  the  gray  levels  of  the 
original  low-contrast  image  are  spread  out  to  create  a  higher-contrast  image. 


(upside  down) 

Effect  of  the  mapping  g  [f]  on  the  input  histogram  H  [/]  to  produce  the  output 

histogram  H  [g] . 


15.2  Histogram  Equalization  (“Flattening”) 

The  quantity  of  information  in  an  image  /  [re,  y]  was  defined  by  Shannon  to  be: 

/max 

J[/]  =  lo§2  i.P  [f\) 

f= 0 

The  meaning  of  information  will  be  considered  in  more  detail  in  the  discussion  of 
image  compression.  The  information  content  in  an  image  is  maximized  if  all  gray 
levels  are  equally  populated.  This  ensures  that  the  differences  in  gray  level  within 
the  image  are  spread  out  over  the  widest  possible  range  and  thus  maximizes  the 
ability  to  distinguish  differences  in  gray  values.  Therefore,  the  act  of  maximizing 
image  information  results  in  the  histogram  with  gray  levels  populated  as  uniformly 
as  possible;  the  process  is  called  histogram  equalization  or  flattening.  The  appropriate 
lookup  table  is  proportional  to  the  cumulative  histogram  of  the  input  image  C  [/] . 
The  mathematical  derivation  of  the  appropriate  g  [/]  is  straightforward. 

Assume  the  point  operation  (lookup  table)  Off  [a:,?/]}  =  g  [x,y]  equalizes  the  out¬ 
put  histogram,  i.e.,  H  [g]  is  flat.  For  simplicity,  assume  that  gray  levels  are  continuous 


bits 

pixel 
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and  that  the  lookup  transformation  g  [f]  is  monotonically  increasing: 


9  [/o]  =  go 

9  [fo  +  A/]  =  go  +  A g 


Since  the  lookup  table  g  [/]  must  be  a  monotonically  increasing  function,  then  the 
corresponding  inverse  operation  must  exist  (call  it  g~l ),  so  that  g^1  [g0]  =  fo-  Because 
the  number  of  pixels  must  be  conserved  (each  pixel  in  /  [a;,  y]  is  also  in  g  [x,  y ]),  then 
the  continuous  probabilities  must  satisfy  the  relation: 


V  [/]  df  =  p  \g]  dg 

H\f \ 


df  —  —ff  dg 


N  J  N 
=>•  H  [/]  df  =  H  [g]  dg 

but  H[g\  is  constant  by  assumption  (flat  histogram),  so  substitute  H  [g]  =  k,  a 
constant: 

H  [f]  df  =  k  dg 

Integrate  both  sides  over  the  range  of  allowed  levels  from  0  to  /0.  The  integral 
evaluated  a  gray  level  f0  is: 


'do 


do 


k  ■  /  dg=  H[f]df  =  C[f0] 


k  ■  9  [/o]  ~  k  ■  g  [f  =  0]  =  C  [f0\ 
9  [/o]  =  ^  •  C  [/o]  +g[f  =  0] 


The  proportionality  constant  k  may  be  evaluated  for  the  number  R  of  available  gray 
levels  (dynamic  range)  and  the  image  “area” 


k  = 


A 

R 


In  the  discrete  case  of  an  N  x  N  image,  the  proportionality  constant  is  k  =  -A.  where 
M  is  the  number  of  available  gray  levels  and  N 2  is  the  number  of  image  pixels. 

The  lookup  table  that  equalizes  the  image  histogram  is: 


9 flat  [fo]  -  J^C  [fo]  +  g  [f  —  0] 

Since  all  pixels  with  the  same  discrete  gray  level  /0  are  treated  identically  by  the 
transformation,  the  histogram  is  “flattened”  by  spreading  densely  occupied  gray  val¬ 
ues  into  “neighboring,”  yet  sparsely  occupied,  gray  levels.  The  resulting  histogram  is 
as  “flat”  as  can  be  obtained  without  basing  the  mapping  on  features  other  than  gray 
level. 

The  local  areas  under  the  input  and  flattened  histograms  must  match;  where  H  [/] 
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is  large,  the  interval  A /  is  spread  out  to  a  larger  A g,  thus  enhancing  contrast.  Where 
H  [/]  is  small,  the  interval  A /  maps  to  a  smaller  A g,  thus  reducing  contrast. 


Continuous  Case 


9 


Jones  plot  for  histogram  equalization  in  the  continuous  case.  The  areas  under  the 
original  and  “flattenedv  histograms  must  match:  where  H  [/]  is  large,  the  interval 
A f  is  spread  out  into  a  larger  A g,  thus  enhancing  contrast.  Where  H  [/]  is  small, 
the  interval  A /  is  compressed  into  a  smaller  range  of  A g,  thus  reducing  contrast. 


Adjacent  well-populated  gray  levels  are  spread  out,  thus  leaving  gaps  (i.e.,  unpop¬ 
ulated  levels)  in  the  output  histogram.  Pixels  in  adjacent  sparsely  populated  gray 
levels  of  /  [a:,  y] often  are  merged  into  a  single  level  in  g  [x,y].  In  practice,  neighbor¬ 
ing  values  with  few  pixels  may  be  combined  into  single  levels,  thus  eliminating  the 
gray-level  differences  of  those  pixels. 
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Discrete  Case 


Scaled  Cumulative 


Input  Histogram 
(Upside  Down) 

Jones  plot  for  contrast  enhancement  in  the  discrete  case. 


15.2.1  Example  of  Histogram  Equalization  —  1-D  “Image” 

Because  the  equalization  operation  acts  on  the  histogram,  and  thus  only  indirectly 
on  the  image,  the  mathematical  operation  does  not  depend  on  the  number  of  spatial 
dimensions  in  the  input  image;  the  process  applies  to  images  with  any  number  of 
dimensions.  For  simplicity  of  presentation,  consider  first  the  equalization  of  a  1- 
D  function.  The  “image”  has  the  form  of  a  decaying  exponential  with  256  pixels 
quantized  to  6  bits  (values  0  <  /  <  63).  The  object  is  shown  in  (a)  its  histogram 
in  (b),  and  its  cumulative  histogram  in  (c).  Note  that  the  histogram  is  significantly 
clustered;  there  are  more  “dark”  than  “light”  pixels.  The  lookup  table  for  histogram 
equalization  is  a  scaled  replica  of  the  cumulative  histogram  and  is  shown  in  (d).  The 
cumulative  histogram  of  the  equalized  output  image  is  C[g]  in  (e),  and  the  output 
histogram  H  [g]  in  (f).  Note  that  the  form  of  the  output  image  in  (g)  is  approximately 
linear,  significantly  different  from  the  decaying  exponential  object  in  (a).  In  other 
words,  the  operation  of  histogram  equalization  changed  BOTH  the  spatial  character 
as  well  as  the  quantization.  The  gray  levels  with  large  populations  (dark  pixels)  pixels 
have  been  spread  apart  in  the  equalized  image,  while  levels  with  few  pixels  have  been 
compressed  together. 
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1-D  Input  Image:  f[n]  Histogram  of  Input:  H[f] 

(a)  (b)  __ 


Cumulative  Histogram 
(c)  of  Input:  C[f] 


Gray  Value 


Cumulative  Histogram 


(g) 


(f) 


Histogram  of  Output:  H[g] 


Gray  Value 


1-D  Output  Image:  g[n] 


Illustration  of  histogram  flattening  of  a  1-D  function:  (a)  256  samples  of  f  [n\, 
which  is  a  decaying  exponential  quantized  to  6f  levels;  (b)  its  histogram  H  [f], 
showing  that  the  smaller  population  of  larger  gray  values;  ( c )  cumulative  histogram 
C  [/]  /  (d)  Lookup  table,  which  is  scaled  replica  of  C  [/] ;  (e)  Cumulative  histogram  of 
output  C[g],  which  more  closely  resembles  a  linear  ramp;  (f)  histogram  H[g\,  which 
shows  the  wider  “ spacing ”  between  levels  with  large  populations;  (g)  .Output  image 

g  [n]  after  quantization. 
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15.2.2  Example  of  Histogram  Equalization 


Low-Contrast  Image 


7925 

1 

1585 

1 

\  1 

_ i 

Li 

^ 

'0  63  1  26  189  255 

#  Pts:  102400,  Mean:  127.309,  SD:  16.9729,  No  exclusions 
Mode:  1 49,  Median:  1 23.  Skew:  0.1 79,  Kurtosis:  1 .57 

Low- Contrast  Histogram 


2-D  “Image” 


Mode:  229,  Median:  127,  Skew:  -0.00058,  Kurtosis:  1 .79 


Equalized  Histogram 

Cumulative  Histogram 
cc  Lookup  Table 

7925 
6340 
4755 
3170 
1585 
0 

H _ 

#  Pts:  102400,  Mean:  115.978,  SD:  52.7263,  No  exclusions 
Mode:  183,  Median:  103,  Skew:  0.178,  Kurtosis:  1 .58 

Histogram  After 
Linear  Enhancement 


ffY 

ifb 

UJJ 

s 

n 

Contrast-Enhanced  Image 


Image  After 
Linear  Enhancement 


First  row:  low-contrast  image  and  its  concentrated  histogram.  Second  row:  nonlinear 
cumulative  histogram  which  is  proportional  to  the  lookup  table  for  histogram 
equalization,  the  resulting  histogram  (showing  the  “spreading  out”  of  well- occupied 
levels  and  the  “ smooshing ”  of  levels  with  small  populations),  and  the  resulting  image. 
Third  row:  the  histogram  and  image  resulting  from  linear  contrast  enhancement. 


15.2.3  Nonlinear  Nature  of  Histogram  Equalization 

The  equalization  lookup  table  in  the  1-D  example  just  considered  is  not  a  straight  line, 
which  means  that  the  gray  value  g  of  the  “output”  pixel  is  NOT  proportional  to  /, 
and  thus  the  mapping  of  histogram  equalization  clearly  is  NOT  linear.  For  subjective 
applications,  where  the  visual  “appearance”  of  the  output  image  is  the  only  concern, 
the  nonlinearity  typically  poses  no  problem.  However,  if  two  images  with  different 
histograms  are  to  be  compared  in  a  quantitatively  meaningful  way  (e.g.,  to  detect 
seasonal  changes  from  images  taken  from  an  airborne  platform),  then  independent 
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histogram  equalization  of  the  two  images  before  comparison  is  not  appropriate  because 
the  images  are  generally  different  nonlinear  operations.  Nonlinear  operations  produce 
unpredictable  effects  on  the  spatial  frequency  content  of  the  scene,  as  you  saw  in  the 
linear  mathematics  course.  Images  should  either  be  compared  after  applying  linear 
mappings  based  on  pixels  of  known  absolute  “brightness”,  or  after  the  histogram 
specification  process  discussed  next. 

Nonlinear  mappings  are  used  deliberately  in  a  “compandor” ,  which  is  a  composite 
word  blending  “compressor”  and  “expandor”.  The  process  of  companding  is  used  to 
maintain  the  signal  dynamic  range  and  thus  improve  the  “signal-to-noise”  ratio  in  a 
noise  reduction  system.  A  common  companding  system  used  in  audio  systems  is  the 
well-known  Dolby  noise  reduction  system  that  is  still  used  for  recording  analog  audio 
signals  on  magnetic  tape.  Analog  signals  are  recorded  on  tape  by  aligning  appropriate 
percentages  of  magnetic  domains  beneath  the  recording  head.  Unavoidable  statistical 
variations  in  the  percentage  of  aligned  domains  generates  an  audible  noise  signal  called 
tape  “hiss”  even  if  no  signal  is  recorded.  The  Dolby  system  boosts  the  amplitude  of 
low-level  high-frequency  input  signals  before  recording;  this  is  called  “pre-emphasis.” 
The  amount  of  amplification  decreases  with  increasing  level  of  the  input  signal.  The 
complementary  process  of  “de-emphasis”  is  performed  on  playback.  The  annoying 
tape  hiss  is  attenuated  while  the  recorded  signal  is  faithfully  reproduced.  Compandors 
are  also  used  in  digital  imaging  systems  to  preserve  highlights  and  shadow  detail  in 
digital  imaging  systems. 


15.3  Histogram  Specification 


It  is  often  useful  to  transform  the  histogram  of  an  image  to  create  a  new  image 
whose  histogram  “matches”  that  of  some  reference  image  fref  [x,  y\ .  This  process 
of  histogram  specification  is  a  generalization  of  histogram  equalization  and  allows 
direct  comparison  of  images  perhaps  taken  under  different  conditions,  e.g.,  LANDS  AT 
images  taken  through  different  illuminations  or  atmospheric  conditions.  The  required 
transformation  of  the  histogram  of  f\  to  H  [fn.ej]  niay  be  derived  by  first  equalizing 
the  histograms  of  both  images: 

Oref  {Iref  [t  2/]}  =  &ref  [t  y\ 

Oi{fi  [ x,y ]}  =  ei  [x,y] 

where  en  [x,  y]  is  the  image  of  fn  [x,  y\  with  a  flat  histogram  obtained  from  the  operator 
0{  };  the  histograms  of  cref  and  e\  are  “identical”  (both  are  flat).  The  inverse  of 
the  lookup  table  tranformation  for  the  reference  image  is  O  1  {.<7/? /.;//}  =  /ref-  The 
lookup  table  for  histogram  specification  of  the  input  image  is  obtained  by  first-  deriving 
the  lookup  tables  that  would  flatten  the  histograms  of  the  input  and  reference  image. 
It  should  be  noted  that  some  gray  levels  will  not  be  specified  by  this  transformation 
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and  so  must  be  interpolated.  The  functional  form  of  the  operation  is: 

r/i  [x,  y]  (with  specified  histogram)  =  Ofef  {O ±  {/i}} 

=  \PrEF  '  Cl]  {/l}  ^  ('REF  {A  {/l}} 

Cumulative  Histogram  Cumulative  Histogram 
of  Input  Image  fj  [n,  m]  of  Specified  Histogram 


Histogram  of  Input  Specified  Histogram 

Image  fj[n, m ]  (upside  down) 

(upside  down) 

Schematic  of  Histogram  Specification:  given  input  image  f\  [x,  y]  and  desired 
“reference’'  histogram  H  [f0\,  the  input  gray  value  is  mapped  through  its  cumulative 
histogram  C  [ff  and  the  “ inverse ”  of  the  reference  cumulative  histogram  C  [/0]  to 

find  the  “output”  gray  value  f0. 


15.4  Application  of  Histograms  to  Tone- Transfer 
Correction 


Histogram  specification  may  be  used  to  compensate  for  a  nonlinear  tone-transfer  curve 
to  ensure  that  the  overall  tone  scale  is  linear.  The  recorded  image  (]\  [n  ■  Ax,  m  ■  Ay]  is 
obtained  from  the  sampled  input  image  /  [n  •  Ax,  m  ■  Ay]  through  the  transfer  curve 
(lookup  table)  gi  [/],  which  may  be  measured  by  digitizing  a  linear  step  wedge.  The 
inverse  of  the  transfer  curve  may  be  calculated  and  cascaded  as  a  second  lookup  table 
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c]2  to  linearize  the  total  transfer  curve: 

92  [gi  {f  [n  ■  Ax,  m  ■  Ay])}  =  f  [n  ■  Ax ,  m  ■  Ay] 

=>-  92  [hi]  =  1 
=>  92  [x]  =  hi-1  [x] 

Note  that  the  display  may  be  linearized  in  similar  fashion.  Consider  a  nonlinear 
digitizer  transfer  curve  of  the  form  ry ,  [/]  =  yJJ.  The  correction  curve  necessary  to 
linearize  the  system  is: 


92  [fi  [/]]  =  92  \  V? 


=  / 


92  M  =  X2 


15.5  Application  of  Histograms  to  Image  Segmen¬ 
tation 


Obviously,  histograms  may  be  used  to  distinguish  among  objects  in  the  image  that 
differ  in  gray  level;  this  is  the  simplest  example  of  segmentation  in  a  feature  space. 
Consider  the  bimodal  histogram  that  often  indicates  the  presence  of  a  brighter  object 
on  a  darker  background.  A  gray  value  fj  may  be  determined  form  the  histogram 
and  used  as  a  threshold  to  segment  the  “foreground”  object.  If  the  histogram  clus¬ 
ters  overlap  (as  they  seemingly  always  do),  then  there  are  bound  to  be  some  false 
identifications.  If  background  pixels  that  should  be  thresholded  to  black  appear  as 
white,  we  speak  of  “false  positives”,  whereas  foreground  pixels  that  are  classified  as 
background  are  “false  negatives.” 


Bimodal  Histogram  Thresholding  Lookup  Table 

Bimodal  histogram,  showing  the  intermixing  of  the  “tails”  of  the  two  object  classes, 
which  produces  false  identifications  in  the  image  created  by  the  thresholding  lookup 

table. 


The  threshold  lookup  table  maps  all  pixels  with  gray  levels  greater  than  fr  to  white 
and  all  others  to  black.  If  the  histogram  clusters  are  disjoint  and  the  threshold  is  well 
chosen  (and  if  the  image  really  contains  a  bright  foreground  object),  a  binary  image 
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of  the  foreground  object  will  result.  In  this  case,  the  histogram  likely  is  composed  of 
two  overlapping  gaussian  clusters,  and  thus  some  pixels  likely  will  be  misclassified  by 
the  threshold.  Segmentation  based  on  gray  level  only  will  be  imperfect;  there  will  be 
false  positive  pixels  (background  pixels  classified  as  foreground),  and  false  negative 
(foreground  classified  as  background).  Consider  the  crude  64  x  64  5-bit  image,  which 
shows  several  distinguishable  objects  even  though  the  histogram  exhibits  only  two 
obvious  clusters.  Segmentation  based  on  this  histogram  will  be  unsatisfactory.  A 
theme  of  the  study  of  image  processing  operations  will  be  to  improve  segmentation  by 
gathering  or  processing  data  to  create  histograms  with  compact  and  distinguishable 
clusters. 


Original  Noisy  Image 


0  256 

Count:  6  5536  Min:  0 

Mean:  164.022  Max:  255 

StdDev:  30.464  Mode:  189  (4598) 


(Threshold  Level: 


I 


Threshold 


Thresholded  Image 
showing  false  identifications 


Segmentation  of  noisy  from  histogram;  the  histogram  contains  four  obvious 
“clusters”;  the  lookup  table  segmented  the  clusters  at  level  158,  which  segmented  the 
sky,  clouds,  and  door  from  the  grass,  house,  and  tree,  but  some  white  pixels  appear 
in  the  grass  (“false  positives”)  and  some  black  pixels  in  the  sky  (“false  negatives” ) . 


This  result  illustrates  the  goal  of  histogram  segmentation;  to  find  some  “feature  space” 
(histogram)  where  the  clusters  of  pixels  from  the  various  objects  are  “compact”  and 
“far  apart.” 

Other  nonlinear  mappings  may  be  used  for  segmentation.  For  example,  the  upper 
LUT  on  the  left  maps  background  pixels  to  black  and  foreground  pixels  to  their 
original  gray  level.  The  other  is  a  level  sheer]  gray  levels  below  f\  and  above  /2  map 
to  zero  while  those  with  fi<f  [x,  y]  <  f 2  are  thresholded  to  white. 
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15.6  Point  Operations  on  Multiple  Images 

g[x,y\  =  0{f[x,y,tn ]} 
g[x,y\  =  0{f[x,y,\n\} 
g[x,y ]  =  0{f[x,y,zn}} 

The  output  pixel  g  [re,  y\  is  a  function  of  the  gray  value  of  that  pixel  in  several  input 
images.  The  input  frames  may  differ  in  time,  wavelength,  depth  (if  they  are  slices  of  a 
3-D  scene),  resolution,  etc.  Most  commonly,  the  gray  values  of  the  multiple  inputs  are 
combined  by  arithmetic  operations  (e.g.  addition,  multiplication);  binary  images  (i.e. 
two  gray  values)  may  be  combined  via  logical  operators  (e.g.  AND,  XOR,  etc.).  It  is 
also  very  common  to  generate  a  multidimensional  histogram  from  the  multiple  inputs 
and  use  the  interpreted  data  to  segment  the  image  via  mult-ispectral  thresholding. 
Applications: 

1.  Image  segmentation  using  multispectral  information 

2.  Averaging  multiple  frames  for  noise  reduction 

3.  Change  detection  by  subtraction 

4.  Windowing  images  by  mask  or  template  multiplication 

5.  Correct  for  detector  nonuniformity  by  division 


We  begin  this  discussion  by  immediately  digressing  to  the  most  common  class  of 
multiple-image  system,  that  of  color  vision  where  the  images  differ  in  the  wavelength 
A. 
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15.7  Digression:  Introduction  to  Vision  and  Color 

One  of  the  more  common  systems  in  this  category  is  the  combination  of  three  mono¬ 
chrome  images  to  create  a  color  image.  To  understand  this  principle,  we  need  to 
introduce  the  human  visual  system  (HVS). 


15.7.1  The  Eye 

The  eye  is  (obviously,  and  no  pun  intended)  the  human  sensor  that  collects  radiant 
energy  and  forms  an  image.  It  contains  two  positive  lens  arrangement  that  generates 
a  real  image  on  the  light-sensitive  retina.  Kepler  (1604)  described  vision  in  terms 
of  the  image  projected  onto  the  retina.  Sheiner  confirmed  Kepler’s  description  by 
looking  at  the  image  created  by  an  eye  in  1625. 
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Structures  in  the  eye  (from  http://www.tedmontgomery.com/the_eye/) 


The  eye  is  nearly  spherical  (2.4  cm  by  2.2  cm  across).  The  vitreous  humor  (behind 
the  lens)  “supports”  the  eyeball,  much  as  air  inflates  a  balloon.  It  contains  micro¬ 
particles  of  cellular  debris  floating  within,  that  produce  entoptic  perception.  Within 
the  sclera  is  the  choroid ,  a  dark  layer  that  absorbs  stray  light  in  the  same  manner  as 
the  black  coating  inside  a  camera.  The  retina  is  a  thin  layer  of  light  receptor  cells 
that  cover  the  inner  surface  of  the  choroid.  The  retinas  of  (at  least  most)  human  eyes 
have  four  kinds  of  receptors:  rods  and  three  kinds  of  cones.  The  receptors  work  via 
a  photochemical  reaction  in  a  photopigment. 

Humans  have  =  75  —  150  million  rods  distributed  over  the  retinal  surface.  They 
are  arrayed  in  groups  of  several  rods  connected  to  a  single  nerve  ending.  This  feature 
increases  the  sensitivity  of  the  eye,  but  decreases  the  spatial  resolution  discernible  by 
these  receptors.  Rods  increase  in  density  from  the  center  to  about  20°  off  axis  and  then 
decrease  in  density  out  to  extreme  periphery.  Thus  they  provide  an  overall  picture 
of  the  field  of  view.  The  rods  contain  rhodopsin ,  which  is  a  “blue-green”  pigment, 
but  they  are  not  sensitive  to  color  and  are  used  at  low  levels  of  illumination  ( scotopic 
vision).  Rod  vision  is  better  in  low  light  situations  because  a  ganglion  cell  will  fire 
when  certain  threshold  signal  for  all  the  sensors  is  reached.  It  is  easier  to  reach  the 
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threshold  with  more  receptors  collecting  light.  Objects  that  appear  brightly  colored 
in  daylight  are  seen  as  colorless  in  dim  light  because  only  the  rods  are  stimulated. 

Humans  have  =  6  —  7  million  of  cones  in  each  eye  that  are  concentrated  in  the 
central  portion  of  the  retina,  the  fovea.  The  cones  are  the  color  receptors.  Each 
cone  is  connected  to  its  own  nerve  ending,  thus  ensuring  a  better  spatial  resolution 
than  the  ganged-together  rods.  Vision  generated  by  cones  is  called  photopic  and  is 
applicable  at  “normal”  (daylight)  levels  of  illumination.  Under  these  conditions,  the 
eye  motion  muscles  rotate  the  eyeball  until  the  image  of  an  object  falls  on  the  fovea, 
where  the  cones  are  located  give  color  and  high  resolution.  The  eye  motion  “jiggles” 
the  image  on  the  retina;  if  the  image  would  fade  out  if  kept  stationary  on  a  given 
spot  of  photoreceptors.  Without  the  fovea  the  eye  would  lose  90%  of  its  capability, 
retaining  only  peripheral  vision. 
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Angular  distribution  of  rods  and  cones  across  the  retina,  also  showing  the  location  of 

the  “blind  spotv  due  to  the  optic  nerve. 

For  simplicity,  we  often  think  of  the  three  cone  receptors  as  sensitive  to  red,  green, 
and  blue  light,  though  this  is  not  strictly  correct.  The  English  scientist  Thomas  Young 
(a  contributor  to  several  areas  of  optics)  hypothesized  in  1801  that  the  eye  has  three 
color  receptors.  This  was  the  basis  for  his  theory  of  trichromacy,  which  observed 
that  the  three  independent  attributes  of  colored  light  (hue,  saturation,  and  lightness) 
suggests  that  the  eye  is  sensitive  to  three  independent  color  input  signals.  Helmholtz 
hypothesized  that  the  three  types  of  cones  are  primarily  sensitive  to  short,  medium , 
and  long  wavelengths  ( S,  M,  L ),  though  the  response  curves  were  assumed  to  overlap. 
The  three  cones  contain  pigments  with  peak  response  at  A  =  447  nm  ( “short” ,  or 
blue),  540 nm  (“medium”,  or  green),  and  577 nm  (“long”,  or  red).  Approximate 
sensitivity  curves  are  shown  in  the  figure: 


324 


CHAPTER  15  POINT  OPERATIONS 


Of  course,  some  humans  (usually  males)  are  born  without  the  use  of  one  of  the 
cone  types  (due  to  missing  connections  to  some  nerves  or  connections  to  the  wrong 
nerves),  or  a  cone  type  contains  an  incorrect  photopigment.  Such  folks  suffer  from 
color  blindness. 


Eye  Sensors 


Schematic  of  a  rod  (top)  and  a  cone  (bottom);  light  is  incident  from  the  left.  The 
shape  of  the  “outer  sections  ”  of  the  receptors  (to  the  right)  leads  to  the  names.  The 
photochemical  is  contained  in  these  outer  sections. 


The  transduction  (transformation)  of  light  energy  into  electrical  energy  occurs  through 
a  chemical  reaction  via  a  photosensitive  dye.  The  photochemical  in  rods  is  called 
rhodopsin  (“visual  purple”),  which  is  derived  from  vitamin  A  (hence  the  parental 
enticement  to  “eat  your  carrots”).  The  interaction  of  light  with  the  molecules  of 
visual  purple  causes  the  electrons  to  oscillate  and  change  the  shape  of  the  molecule. 
The  shape  change  creates  an  electrical  signal  that  is  transmitted  through  the  nerve 
synapse  to  the  brain.  The  “new”  molecular  shape  is  not  stable  and  it  returns  to 
the  original  shape  after  some  time  delay.  The  process  in  cones  is  similar,  but  not 
identical.  Because  cones  work  under  bright  illumination,  their  absorption  process  is 
less  efficient;  some  light  is  “discarded”  by  scattering  from  the  cones  themselves. 


Latency 

Because  the  absorption  process  is  chemical,  the  response  of  receptor  cells  is  not  in¬ 
stantaneous  when  light  arrives  or  is  removed.  In  the  first  case,  the  result  is  the 
“latency  effect” ,  while  the  continuation  in  response  after  light  is  removed  is  the  “per¬ 
sistence  response” .  The  latter  effect  is  the  reason  why  movies  and  video  can  convey 
the  illusion  of  continuous  motion  from  time-sampled  data. 


Brightness  adaptation  and  discrimination: 

Digital  images  are  displayed  as  a  discrete  set  of  intensities.  Important  to  understand 
how  the  eye  differentiate  between  different  intensity  levels.  The  human  visual  system 
adapts  to  a  huge  irradiance  range  of  1010,  from  the  scotopic  threshold  (dimmest  light) 
to  the  bright  glare  limit.  It  is  described  by  a  logarithmic  function  of  the  incident  light 
irradiance. 
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In  photonic  vision  alone,  the  range  of  irradiances  is  about  106.  The  transition  from 
scotopic  to  phot-opic  is  gradual  from  0.001  —  0.1  millilamberts  (or  -3  to  -1  mL  in  log 
scale).  This  huge  dynamic  range  is  accomplished  by  changing  the  overall  sensitivity 
of  the  eye  to  the  average  brightness  via  the  process  of  adaptation.  It  also  is  important 
to  understand  is  the  ability  of  the  eye  to  discriminate  between  irradiance  changes  at 
any  specific  adaptation  level: 

Assume  that  the  retina  is  flooded  with  a  uniform  field  of  illumination  of  level  l. 
Then  imagine  a  short,  localized  flash  in  the  center  of  the  field  of  view  with  level  f + At: 


Field  used  to  evaluate  the  weber  ratio. 


The  Weber  ratio  is: 


Weber  ratio 


A4 


t 


where  A 4  is  the  increment  that  a  subject  detects  50%  of  the  time.  A  large  Weber 
ratio  implies  poor  sensitivity,  meaning  that  a  large  percentage  change  in  intensity  is 
required  for  perception. 
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log/ 


At  low  levels  of  log  [£]  (i.e.,  in  the  dark  range),  log  is  large,  which  means 
that  the  ability  to  discriminate  brightnesses  is  poor.  .  As  log  [t]  increases,  so  that  the 
background  illumination  increases),  the  Weber  Ratio  decreases,  which  means  that  the 
ability  to  discriminate  brightnesses  improves. 

Visual  system  tends  to  undershoot  or  overshoot  around  the  boundary  of  regions 
of  different  intensities. 

Note  that  Weber  ratio  is  large  at  low  levels  of  illumination.  The  two  branches 
mean  that  at  low  levels  of  illumination  vision  is  carried  out  by  the  rods,  at  high  levels 
is  a  function  of  the  cones. 


15.7.2  Convergence 

The  large  number  of  sensors  must  be  connected  to  the  brain.  Since  the  processing 
capacity  of  the  brain  is  limited  (though  prodigious!),  the  signals  from  multiple  sensors 
are  combined  into  a  smaller  number  of  nerves  by  the  processo  of  convergence.  The 
“degree”  of  convergence  is  very  different  for  rods  and  cones:  =  120  rods  and  =  6 
cones  converge  together;  in  the  fovea.  Some  cones  may  have  their  own  ganglia. 


Spatial  Effect  of  Convergence 

The  individual  sensors  are  tied  together  by  a  neural  network  of  ganglion  cells  within 
the  retina. 
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Cones 

A 


B 


Schematic  of  neural  net  for  lateral  inhibition;  a  strong  signal  on  receptor  A  inhibits 

the  signal  at  the  neuron  for  receptor  B. 


The  lateral  connections  between  cells  are  weighted  to  diminish  the  response  from 
neighbors  if  one  cell  is  stimulated  by  a  strong  signal.  This  lateral  inhibition  has  the 
effect  of  introducing  an  analogue  of  an  impulse  response  to  the  imaging  system,  where 
the  response  “on  axis”  is  positive  and  those  of  immediate  neighbors  are  negative.  If 
we  think  of  the  system  as  linear  (which  it  most  certainly  is  not,  but  is  an  effective 
model  if  the  eye  is  well  adapted  to  a  small  range  of  “brightnesses”)  and  shift-invariant, 
then  we  can  construct  a  corresponding  transfer  function  via  the  Fourier  transform.  A 
measurement  of  the  eye  response  under  these  conditions  leads  to  some  valid  conclu¬ 


sions. 


Spatial  Frequency 

Campbell- Rob  son  chart  for  estimating  the  contrast  sensitivity  function. 

The  Campbell-Robson  chart  shown  in  the  figure  consists  of  a  “chirped”  sinusoidal 
grating  whose  frequency  increases  linearly  to  the  right  and  whose  modulation  de¬ 
creases  vertically.  When  the  chart  is  viewed  at  a  fixed  distance,  the  observer  typically 
can  draw  a  line  on  the  chart  where  the  grating  modulation  “disappears;”  this  line 
typically  has  a  peak  value  of  the  modulation  at  a  nonzero  spatial  frequency. 
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Approximate  contrast  sensitivity  function  of  the  HVS,  showing  peak  monochromatic 
(“luminance”)  response  at  a  spatial  frequency  of  approximately  6  cycles  per  angular 
degree  (=  1.7  ■  10~3  cycles  per  arcsecond). 


This  observation  demonstrates  that  the  response  (i.e. ,  the  output  modulation)  of 
the  HVS  is  larger  than  zero  at  DC,  increases  to  its  peak  response  at  a  frequency  of 
about  six  cycles  per  angular  degree  (~  0.003  cycles  per  radian),  and  falls  off  at  larger 
spatial  frequencies,  as  shown  in  (b).  The  corresponding  impulse  response  resembles 
that  shown  below  in  (a). 


(a) 


(b) 
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The  model  of  the  “impulse  response”  of  the  eye  (a)  and  the  corresponding  transfer 
function  (b).  The  impulse  response  shows  the  lateral  inhibition. 


In  this  model,  the  impulse  response  of  the  HVS  is  positive  at  the  origin,  but 
becomes  negative  within  a  short  distance.  In  words,  this  indicates  that  the  response 
of  an  eye  receptor  to  a  bright  source  actually  subtracts  from  the  neighboring  receptors; 
we  actually  say  that  the  response  of  one  receptor  inhibits  the  responses  of  those  nearby. 
In  other  words,  these  neighboring  receptors  must  be  stimulated  more  strongly  to  elicit 
the  same  response  as  the  first.  The  HVS  “processor”  that  generates  this  response  is 
the  network  of  neural  connections  behind  the  retina,  the  so-called  visual  neural  net. 
We  can  use  this  linear  shift-invariant  model  of  the  HVS  to  calculate  the  visual  response 
to  a  specific  input  image.  If  the  stimulus  is  a  simple  “stairstep”  input  irradiance, 
with  regular  steps  of  increasing  brightness,  the  output  exhibits  “overshoots”  at  the 
edges.  This  is  the  phenomenon  of  Mach  Bands,  which  result  in  a  nonuniform  visual 
appearance  of  uniform  areas  in  the  vicinity  of  a  transition  in  gray  level.  The  eye 
response  enhances  edges. 
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(a)  (b) 


x  x 

The  effect  on  the  image  of  lateral  inhibition:  (a)  the  input  is  a  “stairstep’'  function; 
(b)  the  output  shows  the  “overshoot”  characteristic  of  edge  enhancement. 

15.7.3  Eye  Motions 

The  photoreceptors  do  not  respond  to  luminance,  but  rather  to  changes  in  luminance; 
in  other  words,  the  eye  is  an  “AC  system.” 

Saccadic  motion 

Ballistic  point-to-point  jumps  that  the  eye  executes  1  to  3  times  per  second  when 
viewing  a  scene. 

Drift,  tremor,  flicker 

Minute,  involuntary  motions  that  the  eye  executes  continuously.  Necessary  to  main¬ 
taining  vision.  If  you  hold  eyes  in  a  fixed  position  image  would  fade. 

15.7.4  Eye  Lens 

The  power  of  the  eye  is  due  to  the  refractive  effect  of  the  cornea  and  of  the  eye  lens. 
As  we  saw  during  our  discussion  of  optics,  the  cornea  actually  provides  more  of  the 
power,  while  the  lens  varies  the  power  of  the  system. 

Accommodation 

Accommodation  (fine  focusing)  is  performed  by  the  crystalline  lens,  which  is  sus¬ 
pended  behind  the  iris  by  ligaments  connected  to  the  ciliary  muscles.  When  relaxed, 
these  muscles  pull  outward  to  bring  the  lens  into  its  “flattest”  configuration,  so  that 
the  radii  of  the  surfaces  lengthen,  producing  a  longer  focal  length.  For  a  perfect 
eye,  from  an  object  at  infinity  will  be  focused  on  the  retina  if  the  ciliary  muscles  are 
completely  relaxed.  If  the  object  is  positioned  closer  to  the  eye,  the  ciliary  muscles 
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contract,  relieving  the  external  tension  on  the  periphery  of  lens.  The  radii  of  the  lens 
surfaces  decreases,  and  thus  so  does  the  focal  length.  This  ensures  that  the  image 
distance  remains  unchanged.  The  closest  location  of  an  object  where  the  eye  can  ac¬ 
commodate  is  the  near  point ,  which  tends  to  increase  with  age  (70  mm  for  a  teenager, 
=  120  mm  for  young  adult,  =  350  mm  at  middle  age,  and  =  80  mm  at  60  years  of 
age).  People  whose  near  points  are  abnormally  close  to  the  eye  are  nearsighted.  This 
trait  was  actually  valuable  in  ancient  times  because  nearsighted  people  could  see  fine 
detail  up  close,  as  when  working  on  jewelry.  Since  the  ability  of  the  eye  to  accommo¬ 
date  has  some  hereditary  basis,  it  was  common  for  families  to  have  a  tradition  of  this 
kind  of  precision  work. 

Normal  wavelength  range  of  human  vision  is  in  the  wavelength  390  nm  ~  A  ^ 
780 nm  (which  actually  seems  to  be  too  large,  particularly  on  the  red  end).  The 
limitation  on  sensitivity  to  ultraviolet  wavelengths  is  due  to  absorption  by  crystalline 
lens. 

Nearsightedness  (myopia) 

Myopic  eyes  bring  parallel  rays  to  focus  in  front  of  the  retina;  in  other  words,  the 
power  of  the  lens  is  too  large.  Myopia  is  corrected  by  placing  another  lens  in  front 
of  the  eye  such  that  the  focal  point  of  the  combined  power  lens-eye  system  is  on 
the  retina  without  changing  the  power  of  the  eye+auxiliary  lens  system.  We  saw  in 
the  section  on  optics  that  the  the  system  power  is  unchanged  if  an  additional  lens  is 
placed  at  the  front  focal  point  of  the  original  system.  This  can  be  seen  by  applying 
the  two- lens  equation  with  the  separation  distance  r  =  (peye)1 

P system  =  P corrector  +  Peye  ~  P corrector  ‘  Peye  '  (Peye)~l 
P corrector  “b  Peye  P corrector 

P  system  peye 

If  you  are  nearsighted  and  wear  glasses,  note  that  the  images  are  the  same  size  with 
and  without  correction.  Since  corrective  contact  lenses  are  placed  directly  upon  the 
cornea,  we  can  assume  that  the  distance  between  the  corrective  lens  and  the  power 
due  to  the  first  surface  of  the  cornea  is  zero,  thus  the  power  of  the  corrected  cornea 
is  the  sum  of  the  powers  of  the  two  lenses: 

P cornea  P corrector  ~l  P cornea 

Farsightedness  (hyperopia) 

In  this  condition,  the  second  focal  point  lies  behind  the  retina,  i.e.,  the  lens  power  is 
insufficient,  or  is  placed  too  close  to  the  retina.  To  increase  the  bending  of  the  rays  a 
positive  lens  is  place  in  front  of  the  eye.  A  farsighted  person  can  see  distant  objects 
sufficiently  well,  but  cannot  bring  objects  close  to  our  eye  to  increase  the  angular 
subtense  of  fine  structure  on  the  retina  because  the  near  point  is  farther  away  than 
normal. 
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Astigmatism 

This  is  an  aberration  of  the  eye  system  due  to  different  radii  of  curvature  of  the  cornea 
along  different  axial  directions,  thus  resulting  in  different  focal  lengths  for  horizontal 
and  vertical  objects. 

15.7.5  Spatial  resolution 

The  metric  of  “visual  acuity” 

•  Seek  the  ability  to  characterize  the  resolving  power  of  the  eye. 

•  Snellen  tests  used  by  optometrists 

•  Test  for  visual  acuity  where  the  patient  reads  Snellen’s  chart  at  a  certain  dis¬ 
tance  with  one  eye,  then  with  the  other,  and  then  with  both  eyes.  Acuity  is 
given  by  the  longest  distance  at  which  the  patient  is  able  to  read  the  letters, 
divided  with  the  distance  at  which  he  should  normally  be  able  to  read  them. 

15.8  Color  Vision 

The  three  cones  contain  pigments  with  peak  response  at  A  =  447  nm  ( “short” ,  or 
blue),  540 nm  (“medium”,  or  green),  and  577 nm  (“long”,  or  red).  Approximate 
sensitivity  curves  are  shown: 


Wavelength  (nm) 

The  response  of  short,  medium,  and  long  cones. 

Curves  have  been  determined  through  physiological  measurements  of  absorption  spec¬ 
tra  of  individual  cones  in  vitro,  and  psychophysical  color  matching  experiments.  The 
curves  are  normalized. 

The  responses  of  the  cones  differ,  and  also  the  number  of  short,  medium,  and  long 
( SML )  cones  are  not  equal.  There  are  signficantly  fewer  S  cones  than  the  others,  so 
that  the  S  cones  contribute  little  to  the  overall  brightness  sensation. 

The  sensitivity  curves  provide  the  basis  for  color  vision:  Assume  two  lights  with 
two  wavelengths  (Ai  =  500  nm  and  X2  =  600  nm).  The  first  generates  a  response 
from  M  cones  that  is  twice  as  large  as  its  L  response.  The  second  light  produces 
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approximately  twice  the  response  in  L  cones  than  M.  Now  change  relative  intensities 
of  the  two  lights:  the  individual  L  and  M  responses  will  be  different  (corresponding 
to  difference  in  luminance)  but  their  ratios  will  be  constant,  which  means  that  we 
can  separate  changes  in  intensity  from  changes  in  color.  One  single  type  of  detector 
could  not  accomplish  this! 

15.8.1  Color  Matching,  Metamers 

A  consequence  of  color  matching  is  that  two  colors  can  appear  identical  to  the  eye  even 
though  they  have  different  spectral  compositions.  These  are  called  metamers.  They 
appear  to  match  under  one  illumination  but  mismatch  when  viewed  under  another. 
This  is  because  color  is  a  sensation  rather  than  a  property  of  an  object;  the  cones  can 
register  the  same  sensation  from  an  infinite  variety  of  combinations  of  different  light 
frequencies  and  amplitudes. 

Grassman’s  Law 

The  eye  can  distinguish  only  three  attributes  of  a  light:  hue,  brightness ,  and  satu¬ 
ration.  Hue  is  that  psychological  dimension  of  color  which  roughly  corresponds  to 
wavelength.  Brightness  is  the  psychological  dimension  of  color  which  most  closely 
relates  to  physical  intensity  and  saturation  is  the  amount  of  hue  a  color  possesses  and 
is  most  closely  correlated  with  spectral  purity.  Two  lights  with  two  different  colors 
added  to  two  other  lights  with  the  same  colors  produce  mixtures  of  the  same  color: 
if  a  =  b  and  c  =  d.  then  a  +  c  =  b  +  cl.  Two  lights  of  the  same  color  each  subtracted 
respectively  from  two  other  light  of  the  same  color  will  leave  mixtures  of  the  same 
color:  if  a  =  b  and  c  =  cl  then  a  —  c  =  b  —  cl.  If  one  unit  of  light  a  has  the  same  color 
as  one  unit  of  b,  then  ka  =  kb.  Luminance  produced  by  the  additive  mixture  of  a 
number  of  lights  is  the  sum  of  luminances  produced  separately  by  each  light. 

Mathematically  Grassman’s  laws  define  a  vector  space  =>  we  can  represent 
any  color  as  a  vector  in  a  3-D  space,  where  the  basis  vectors  are  the  primary  colors. 
It’s  obvious  that  the  primary  colors  have  to  be  linearly  independent,  meaning  that 
any  one  cannot  be  created  as  a  weighted  sum  of  the  others.  In  this  vector  space,  an 
arbitrary  color  C  can  be  matched  by  appropriate  quantities  of  the  three  primaries  R, 
G,  B: 

C  =  r1R  +  giG  +  b1B 

where  f\ .  g  { .  h{  are  the  weights  (number  of  units)  applied  to  each  primary  to  make 
the  match. 

Trichromancy 

The  weighted  sum  means  that  every  colored  light  C  is  the  physical  sum  of  a  number  of 
essentially  pure  spectral  components  and  can  be  described  by  a  function  E  [A]  (radiant 
power  per  unit  wavelength).  The  result  of  the  experiment  across  a  wide  range  of  test 
wavelengths  produces  the  so-called  color  matching  functions  for  primaries  at  436  nm, 
546  nm,  and  700  nm.  The  color  matching  functions  reveal  the  fundamental  fact  that 
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human  color  vision  is  trichromatic.  Trichromacy  simply  means  that  any  visible  color 
can  be  matched  by  a  mixture  of  three  primary  colors. 


Color  matching  functions 


Note  that  the  color  matching  functions  include  negative  weights.  For  example,  this 
means  that  A  =  500  nm  cannot  be  matched  by  the  three  primaries  unless  some  red 
is  subtracted  from  the  mixture.  No  colors  exist  that  can  be  used  as  primaries  such 
that  the  color  matching  functions  are  wholly  positive.  In  1931,  CIE  adopted  a  set  of 
fictitious  primaries  that  result  in  wholly  positive  functions: 


CIE  synthesized  color  matching  functions  with  positive  weights. 


15.8.2  XYZ  Color  Space 

To  find  the  X,  Y,  Z  values  corresponding  to  a  particular  color  object  let  R  [A]  be 
the  spectral  transmittance  or  reflectance  of  the  object,  and  let  P  [A]  be  the  spectral 
power  per  unit  wavelength  interval  of  the  illuminating  source.  Also  let  x\,y\,  and  z\ 
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be  the  CIE  positive-definite  color  matching  functions: 


X 

Y 

Z 


r  780 

'  380 
/•780 

'  380 
r  780 

'  380 


P(A) 

P(A) 


P  (A) 


P  (A)  x\  dX 
R  (A)  yx  dX 
R  (A)  Aa  c/A 


where  k  is  a  normalizing  factor  such  that  Y  =  100  when  the  object  is  a  perfect  white 
diffuser  or  a  perfect  transmitter  of  light  for  the  entire  visible  band.  Y  is  therefore 
proportional  to  total  luminance  and  matches  the  phot-opic  response  of  the  eye.  The 
discrete  forms  of  the  equations  are: 


780 

X  =  kJ2p( A)  R  (A) 

380 

780 

Y  =  kJ2p(  A)  R  (A)  yx 

380 

780 

Z  =  kJ2p(  A)  R  (A) 

380 


Determination  of  X,Y,Z 


X 

n  gi  h 

R 

Y 

= 

r2  g2  b2 

G 

Z 

r3  g3  b3 

B 

The  color  can  be  characterized  independent  of  its  luminance  via: 

X 

X ~ X+Y+Z 
Y 

11  ~  X  +  Y  +  Z 
z  =  1  —  (x  +  y) 


Z 

X  +  Y  +  Z 


which  are  known  as  chromaticity  coordinates.  The  normalization  reduces  the  3D 
color  space  to  a  2D  plane  that  satisfies  the  constraint  x  +  y  +  z  =  1.  Purely  additive 
combinations  are  found  inside  the  triangle: 
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Plane  that  satisfies  the  constraint  x  +  y  +  z  =  1. 


All  colors  that  are  possible  by  a  combination  of  two  primaries  will  lie  on  a  straight 
line  connecting  the  primaries  in  this  space.  Any  color  that  can  be  derived  from  any 
three  primaries  will  lie  inside  the  triangle  whose  apexs  are  the  primaries;  this  triangle 
represents  the  gamut  of  the  primaries 


15.8.3 


CIE  Chromaticity  Diagram 


Additive  mixtures  of  colors  lie  along  a  straight  line  connecting  those  colors.  The 
complement  of  any  color  is  found  by  extending  a  straight  line  form  that  color  through 
white  to  the  opposite  side  of  the  CIE  diagram 
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Definition  15.1  Chromaticity  Diagram:  A  two-dimensional  Cartesian  plot  that  de¬ 
picts  the  multidimensional  subjective  relationship  among  colors  perceived  by  the  nor¬ 
mal  human  visual  system  (eyes  and  nervous  system,  including  the  brain)  when  ad- 
ditively  stimulated  by  (two  or  more;  usually  three)  discrete  monochromatic  visible 
sources  (wavelengths).  Note:  The  familiar  CIE  chromaticity  diagram  depicts  per¬ 
ceived  colors  plotted  as  a  function  of  the  normalized  relative  intensity  of  a  defined 
red  (increasingly  red  with  increased  ”X”,  or  abscissa,  value)  versus  the  normalized 
relative  intensity  of  a  defined  green  (increasingly  green  with  increased  ”Y”.  or  ordi¬ 
nate,  value).  With  respect  to  a  given  perceived  color,  as  plotted  on  the  chromaticity 
diagram,  the  normalized  relative  intensity  of  a  defined  blue  at  any  point  is  obtained 
by  adding  the  normalized  relative  intensities  of  the  red  and  green,  and  subtracting  the 
total  from  1. 

Additive  mixtures  of  colors  lie  along  a  straight  line  connecting  those  colors.  The 
complement  of  any  color  is  found  by  extending  a  straight  line  form  that  color  through 
white  to  the  opposite  side  of  the  CIE  diagram 

http:/ /www.  atis.  org/tg2k/__  chromaticity _  diagram,  html 

In  1931,  the  CIE  defined  three  standard  primaries  ( X ,  Y,  Z )  .  The  Y  primary 
was  intentionally  chosen  to  be  identical  to  theaverage  luminous-efficiency  function  of 
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human  eyes.  The  three  standard  CIE  primary  colors  are  (A  =  435.8  nm,  546.1  nm 
and  700  nm). 

CIE  has  recommended  the  use  of  other  color  spaces,  all  derived  from  XYZ.  L*a*b* 
is  used  for  non- luminous  objects  such  as  textiles,  paints,  plastics,  etc.  L*u*v *  is 
helpful  in  the  registration  of  color  differences  experienced  with  flashes,  photography, 
television  screen,  etc.  These  systems  are  useful  in  specifying  small  differences  between 
color  stimuli.  The  motivation  was  to  find  coordinates  that  relate  in  a  linear  fashion 
to  the  perceptual  attributes  of  color  (perceptually  uniform  color  spaces) 

L*a*b* 

•  L*  is  the  lightness  axis  and  extends  from  0  (black)  to  100  (white). 

•  a*  represents  the  redness- greenness 

•  b *  represents  the  yellowness-blueness 


where  Xn,  Yn.  Zn  are  the  coordinates  of  the  reference  white. 

15.8.4  Color  Reproduction 

A  color  imaging  system  usually  goes  through  three  steps  in  order  to  reproduce  an 
image: 

1.  Color  separation:  Derive  R,  G,  B  signals  from  the  input  scene,  just  like  the  eye 
would  do.  System  need  more  than  one  kind  of  detector 

2.  Processing:  The  system  must  convert  the  R,  G,  B  signals  from  the  detectors  into 
output  signals  that  are  suitable  for  the  color  reconstruction  stage.  For  example 
a  current  that  controls  the  amount  of  red  dye  to  a  nozzle.  Signal  processing  may 
be  electronic  (TV),  chemical  (photography),  or  more  complex  combination. 

3.  Reconstruction:  Image  must  be  produced  by  whatever  means  are  appropriate: 
dyes,  phosphors,  etc.  There  are  two  basic  methods  of  color  reproduction:  sub¬ 
tractive  and  additive. 
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Subtractive  Color  Reproduction 

Color  printing  uses  three  colored  dyes:  cyan,  magenta  and  yellow.  These  can  be 
thought  of  as  complementary  to  the  additive  primaries  red,  green  and  blue.  In  other 
words,  cyan  absorbs  the  red  part  of  the  spectrum  so  by  varying  its  concentration 
we  can  vary  the  absorption  of  red  while  having  small  effect  on  other  parts  of  the 
spectrum. 


Additive  Color  Reproduction 

Used  with  lights  (e.g.,  phosphors)  of  the  three  additive  primary  colors,  red.  green,  and 
blue.  There  are  phosphors  with  smooth  spectral  distributions  in  the  blue  and  green, 
but  red  phosphors  tend  to  have  “spikey”  spectral  distributions.  As  a  consequence  the 
full  range  of  human  color  perception  cannot  be  reproduced. 

According  to  Hunt,  there  are  6  different  types  of  color  reproduction: 

1.  Spectral:  Spectral  reflectance  or  transmittance  of  the  image  matches  the  original 
exactly.  This  is  not  possible  in  normal  reproductions  since  the  color  gamut  does 
not  encompass  the  entire  “horseshoe” . 

2.  Colorimetric:  Reproduction  matches  the  original  in  chromaticity  and  relative 
luminance,  i.e.,  the  original  and  reproduced  colors  are  metamers.  Of  course, 
whether  a  reproduction  is  colorimetric  varies  with  the  observer. 

3.  Exact:  In  addition  to  colorimetric  reproduction  we  have  equality  of  absolute 
luminances,  i.e.,  the  appearance  of  colors  does  not  depend  on  the  illuminant 
intensity. 

4.  Equivalent:  Chromaticities,  relative  luminance,  and  absolute  luminance  are  ad¬ 
justed  to  achieve  equality  of  appearance,  i.e.,  we  assume  the  viewing  conditions 
and  adjust  by  making  the  reproduction  colorimetrically  incorrect  but  perceptu¬ 
ally  correct. 

5.  Corresponding:  Similar  to  “equivalent”  but  does  not  require  that  absolute  lumi¬ 
nances  match,  i.e.,  adjust  chromaticity  and  relative  luminance  so  as  to  achieve 
equality  of  appearance  as  if  the  original  was  lit  by  the  reproduction  illuminant. 

6.  Preferred:  Issue  with  Caucasians  who  don’t  like  their  skin  tone. 

15.8.5  Color  Spaces 

Red,  Green,  Blue 

This  representation  is  usually  graphed  in  a  Cartesian  system  analogous  to  [x,y,  z].  as 
shown: 
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(max  red) 


RGB  coordinates  (8  bit)  displayed  as  a  Cartesian  system.  Locations  with  the  same 
value  in  each  coordinate  are  “neutral'’,  e.g.,  [0,0,0]  is  “black”,  [255,255,255]  is 
“white”,  and  others  are  “gray”.  Pure  colors  appear  along  the  axes. 


Hue,  Saturation,  Lightness  (or  Brightness,  or  Value): 

This  is  the  representation  that  led  to  Young’s  theory: 


•  Hue  corresponds  to  the  common  definition  of  color,  e.g.,  “red”,  “orange”,  “vi¬ 
olet”  etc.,  specified  by  the  dominant  wavelength  in  a  spectrum  distribution, 
though  a  “dominant”  may  not  actually  be  present 

•  Saturation  (also  called  chroma):  an  expression  for  the  “strength”  or  “purity” 
of  a  color.  The  intensity  of  a  very  saturated  color  is  concentrated  near  the 
dominant  wavelength.  Looked  at  another  way,  saturation  is  measured  as  the 
amount  of  “gray”  in  proportion  to  the  hue.  All  colors  formed  from  one  or  two 
primaries  have  100%  saturation;  if  some  amount  of  the  third  primary  is  added 
to  a  color  formed  from  the  other  two,  then  there  is  at  least  some  “gray”  in 
the  color  and  the  saturation  decreases.  The  saturation  of  a  pure  white,  gray, 
or  black  scene  (equal  amounts  of  all  three  primaries)  is  zero.  A  mixture  of  a 
purely  saturated  color  (e.g.,  “red”)  and  white  produces  a  “desaturated  red”,  or 
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“pink”.  Saturation  is  reduced  if  surface  reflections  are  present. 
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In  more  scientific  terms,  the  saturation  is  the  relative  bandwidth  of  the  visible 
output  from  a  light  source.  A  source  with  a  large  saturation  has  a  narrow 
bandwidth,  and  vice  versa.  As  saturation  increases,  colors  appear  more  “pure.” 
As  saturation  decreases,  colors  appear  more  “washed-out”. 


—  Brightness :  sensation  of  intensity  of  a  light,  from  dark  through  dim  to 
bright. 

—  Lightness:  a  relative  expression  of  the  intensity  of  the  energy  output  re¬ 
flected  by  a  surface;  “blackness” ,  “grayness” ,  “whiteness”  of  a  visible  light 
source.  It  can  be  expressed  as  a  total  energy  value  or  as  the  amplitude  at 
the  wavelength  where  the  intensity  is  greatest. 

HSB  is  often  represented  in  a  cylindrical  coordinate  system  analogous  to  (r,  9,  z). 
The  saturation  coordinate  is  plotted  along  the  radial  axis,  the  hue  as  the  azimuthal 
coordinate,  and  the  lightness  as  the  vertical  (z)  axis.  The  hue  determines  the  fre¬ 
quency  of  light,  the  position  in  the  spectrum,  or  the  relative  amounts  of  red,  green 
and  blue.  It  is  a  continuous  and  periodic  scale  that  often  is  measured  in  an  “angle” 
in  angular  degrees  (e.g.,  the  “hue  angle”),  though  it  also  may  be  normalized  to  be 
compatible  with  8- bit  numerical  representations.  Hues  located  at  the  extrema  (e.g., 
angles  of  ±180°)  are  identical,  as  shown  in  the  figure  taken  from  the  hue  adjustment 
in  Adobe  Photoshop1  M.  A  pure  hue  is  50%  luminosity,  100%  saturation.  The  hue 
angles  are  shown,  where  red  corresponds  to  an  angle  of  0°. 


-180° 


0° 


+180° 


The  hue  representation  used  in  Adobe  Photoshop  ™ .  The  hue  at  angle  0°  is  “red” 
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To  illustrate  the  continuity  of  the  hue  circle,  it  may  be  rotated  about  the  azimuth 
and  to  show  how  it  “wraps  around”  at  cyan  ( 9  =  180°) ,  the  complementary  color  to 
red. 


180°  0° 

The  hue  axis  after  rotation  by  180°,  showing  the  “ wraparound ”  at  the  edge  of  the 

axis. 

Hue  is  represented  by  8- bit  integer  values  in  other  applications,  such  as  Powerpoint1  M . 
A  list  of  the  primary  colors  for  different  hue  angles  is  shown  in  the  table. Note  that 
the  additive  primaries  are  located  at  0°  and  ±120°,  while  the  subtractive  primaries 
are  at  ±60°  and  180°.  Colors  at  opposite  sides  of  the  hue  circle  (separated  by  180°) 
are  complementary,  so  that  the  sum  of  two  complementary  colors  produces  white. 

The  sum  of  monochromatic  yellow  (A  =  580  nm)  and  monochromatic  blue  (A  = 
480  nm)  produces  white  light  that  looks  just  as  while  as  the  sum  of  all  visible  wave¬ 
lengths 
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Lightness, 
(Value,  Intensity) 
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246 

240 

0 

42 

255 

120 

yellow 

195 

192 

64 

42 

128 

128 

gray 

128 

128 

128 

42 

0 

128 

Microsoft  Windows  color  dialogs  also  use  HSB  but  call  the  third  dimension  “lu¬ 
minosity”  or  “lightness”.  It  ranges  from  0%  (black)  to  100%  (white). 

The  RGB  model  is  quite  simple,  so  it  is  natural  to  consider  the  advantages  of  the 
HSL  color  model: 

1.  You  can  generate  grey  scales  using  only  one  parameter  -  the  luminosity  (set 
saturation  to  0). 

2.  You  can  vary  the  color  without  changing  the  brightness  -  vary  the  hue  alone. 

3.  You  can  fade  or  darken  several  colors,  or  whole  bitmaps,  such  that  the  lightness 
(or  darkness)  stay  in  step. 

The  HSL  model  is  easier  to  use  visually  because  it  suits  the  eye,  whereas  the  RGB 
model  is  easier  to  use  in  programming. 

15.8.6  Conversion  from  RGB  to  HSL 

Recall  the  transformations  between  Cartesian  coordinates  [a;,  y,  z]  and  cylindrical  co¬ 
ordinates  (r,9,z): 

r  =  \J x2  +  y2  x  =  r  cos  [9] 

9  =  tan”1  [|]  y  =  rsin  [0] 

z  =  z  z  =  z 

The  transformation  from  Cartesian  to  cylindrical  coordinates  is  nonlinear  and  thus 
cannot  be  written  as  a  matrix- vector  product.  The  scheme  for  computing  HSL  from 
RGB  is: 

1.  Normalize  the  three  values  [R,  G,  B]  to  [0, 1] 

2.  Find  the  maximum  and  minimum  of  the  three  values;  these  are  color max  and 
color  m;n 

3.  If  all  three  RGB  values  are  identical,  then  the  hue  and  saturation  are  both  0 
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4.  Compute  the  lightness  L  via 


L 


color  max  T  color  min 
2 


5.  Test  L: 


If  L  <  0.5  then  S 
If  L  <  0.5  then  S 


color r 


col  or r 


Color max  T  Co/o?’m ;n 
co/  or  max  colorm[n 


(col  or r 


color  r 


6.  Compute  hue  H  via: 


(a)  If  color max  =  R  then 


H 


G-B 

color  max  color  min 


(b)  If  color max  =  G  then 


H 


2  + 


B-R 

color  max  Colorm  in 


(c)  If  colormax  =  /i  then 


H 


4  + 


R-G 

color  max  COlorm  in 


7.  Convert  L  and  51  back  to  percentages,  and  H  into  an  angle  in  degrees  (i.e.,  scale 
it  from  0-360). 


25 

25  - 

► - =  0.098 

255 

204 

204  - 

->• - =  0.8 

255 

53 

53  - 

'  256  S  0  ™ 

Example: 
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G 

color  max  =  —  =  0.800 
255 

,  R 

color  min  =  —  =  0.098 
0.8  +  0.098 

L  = - - - =  0.449  <  0.5 

L  =  0.449  •  255  =  115 
S  =  °'8  ~  0,098  •  255  =  0.782  •  255  =  199 


H  =  2 


0.8  +  0.098 

0.208  -  0.098 


0.8  -  0.098 
“  2.157 radians  +  +123.6° 


Conversion  from  HSL  to  RGB 

1.  If  S  =  0,  define  L  =  R  =  G  =  B, otherwise,  test  L  : 

If  L  <  0.5,  then  a  =  L  ■  (S  +  1) 

If  L  >  0.5,  then  a  =  L  +  S  —  L  ■  S 

2.  Set 

/3  =  2.0  -L-  a 

3.  Normalize  hue  angle  H  to  the  range  [0, 1]  by  dividing  by  360° 

4.  For  each  of  R,  G,  B,  compute  7  as  follows: 


for  R,  7  =  H  +  - 
o 

for  G,  7  =  H 

for  B,  7  =  H  —  - 
3 


5. 

If  7  <  0,  then  7  =  1  +  7 

6.  For  each  of  [R,  G,  B],  do  the  following  test: 

If  7  <  -,  then  color  =  f3  +  (a  —  (3)  ■  6  •  7 
6 

If  -  <  7  <  -,  then  color  =  a 
6  2 

1  2 

If  —  <  7  <  — ,  then  color  =  (3  +  (a  —  (3)  ■  (4  —  67) 

Zi  O 
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7.  Scale  back  to  the  range  [0, 100] 

8.  Repeat  steps  6  and  7  for  the  other  two  colors. 

Other  Descriptors  of  Colors 

•  Hue:  a  “pure”  color,  i.e. ,  one  containing  no  black  or  white. 

•  Shade:  a  “dark”  color,  i.e.,  one  produced  by  mixing  a  hue  with  black 

•  Tint:  a  “light”  color,  i.e.,  one  produced  by  mixing  a  hue  with  white 

•  Tone:  color  produced  by  mixing  a  hue  with  a  shade  of  grey. 


15.8.7  Phenomena  of  Color  Vision 


Afterimages 


This  is  a  well-known  optical  phenomenon  due  to  the  photochemical  sensitivity  of  the 
eye.  If  exposed  to  a  stationary  scene  for  an  extended  period  of  time,  the  photochemical 
dyes  that  are  sensitive  to  that  color  become  “depleted.”  If  exposed  to  a  neutral  white 
background,  the  eye  “sees”  the  complement  to  that  color,  because  only  the  dyes 
sensitive  to  that  color  are  available.  The  phenomenon  often  is  illustrated  by  viewing 
a  representation  of  the  American  flag  rendered  in  the  colors  that  lie  opposite  on  the 
hue  angle  (cyan  for  red,  yellow  for  blue,  and  black  for  white).  Afterimages  may  be 
created  in  Adobe  Photoshop11'1  by  using  the  “inverse”  option  that  cascades  a  rotation 
of  the  hue  by  180°  and  an  “inversion”  (complementing  of)  the  lightness. 


Original  Image 


After  Rotating  “Hue”  After  Complement 

by  180°  of  “Lightness” 


After  “Inverse” 
in  Adobe  Photoshop™ 

Constructing  a  specimen  for  demonstrating  the  “afterimage:”  rotate  the  hue  by  180° 
and  complement  the  lightness,  or  just  use  the  “inverse”  operation  in  Adobe 
Photoshop™ .  Test  it  by  staring  at  the  processed  image  for  30+  seconds  and  then 

look  at  a  blank  sheet  of  white  paper. 
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15.9  Color  and  Multispectral  Image  Processing 

We  now  return  to  consider  pixel  processing  of  multiband  images,  that  may  include 
images  differing  in  color,  time,  or  other  parameter.  The  gray  value  of  each  pixel 
in  the  output  image  is  determined  from  the  gray  values  of  the  corresponding  input 
pixel  in  the  various  bands.  For  example  we  can  decompose  a  color  or  monochromatic 
luminance  (lightness)  image  from  the  gray  values  of  the  image  in  each  of  the  three 
additive  primary  colors  via: 


9  [x,  y }  =  af  [ x ,  y,  Ar]  +  f3f  [x,  y ,  Xg\  +  7/  [x,  y,  Xb] 

=  afR  [x,  y]  +  PfG  [x,  y]  +  t/b  [%,  y] , 

where  the  coefficients  [a,  /?,  7]  are  functions  of  the  spectral  filtration  and  sensitivity 
of  the  recording  process.  The  gray  values  of  the  same  pixel  in  the  three  images  are 
generally  different  but  correlated.  For  example,  a  red  object  will  be  bright  in  the 
red  image  and  darker  in  the  green  and  blue.  The  decomposition  of  a  crude  color 
image  into  its  component  RGB  images  is  shown  below;  note  that  the  histograms  do 
not  exhibit  easily  distinguishable  clusters  and  thus  it  will  be  difficult  to  segment  the 
objects  from  the  image  effectively. 


Red 


Simple  3-band  color  image,  the  individual  monochromatic  images,  and  their 
histograms.  Note  that  the  gray  values  of  the  “tree”  are  approximately  equal  to  those 
of  “sky”  in  the  red  band  and  approximately  equal  to  those  of  both  “sky”  and  “grass” 
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in  the  green  band.  For  this  reason,  the  tree  is  difficult  to  segment  from  a  single  color 

channel. 


The  three  1-D  histograms  do  not  exhibit  distinct  pixel  clusters  for  each  of  the  five 
object  colors  (red  house,  green  tree,  pale  green  grass,  blue  sky,  and  white  clouds  and 
door).  In  other  words,  the  clusters  of  pixels  belonging  to  these  five  classes  overlap. 
This  observation  forces  the  conclusion  that  the  five  objects  cannot  be  segmented 
successfully  from  a  single  band.  That  said,  we  can  observe  that  the  spectral  reflectance 
of  pixels  belonging  to  the  “house”  is  significantly  different  from  the  other  objects, 
and  those  pixels  may  be  segmented  from  the  other  objects  fairly  easily,  (e.g.,  by 
a  simple  threshold  in  the  blue  image,  as  shown  after  selecting  a  threshold  level  of 
112).  However,  the  reflectance  of  the  pixels  belonging  to  the  tree  in  the  blue  image 
is  only  slightly  different  from  those  belonging  to  sky.  Attempts  to  segment  pixels 
belonging  to  the  “tree”  from  the  blue  image  are  shown,  which  lead  to  either  a  “noisy” 
segmentation,  or  the  misidentification  of  the  “tree”  and  the  “grass.” 


(b<  140) 
=  black 


(b<  112) 
=  black 


(140  >  b  >  112) 
=  black 


(b<  167) 
=  black 


(b <  112)  (167  >b  >  112) 

=  black  =  black 


Two  attempts  to  segment  “treev  from  blue  image  alone.  The  first  sets  all  pixels  to 
black  in  the  range  112  <b<  140,  which  produces  a  very  noisy  “tree.  ”  Pixels  in  the 
second  image  are  thresholded  to  black  if  they  lie  in  the  range  112  <  b  <  167.  The 

segmented  pixels  include  the  “grass.  ” 
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15.10  Multispectral  Histograms  for  Feature  Ex¬ 
traction 


As  already  described,  visible-light  color  images  are  represented  by  triads  of  mono¬ 
chrome  images.  The  decomposition  of  color  images  into  RGB  bands  is  very  com¬ 
mon.  Within  the  neural  network  of  the  eye,  the  three  cone  signals  are  weighted  and 
combined  to  generate  the  three  channels  that  are  transmitted  to  the  brain.  Roughly 
speaking,  one  channel  corresponds  to  the  lightness  or  luminance  of  the  scene  (i.e.,  the 
black-and-white  video  signal),  and  is  a  weighted  sum  (integral)  of  S, M.  and  L.  This 
information  is  transmitted  to  the  brain  with  full  resolution.  The  other  two  transmit¬ 
ted  signals  are  weighted  differences  (derivatives)  of  the  S,  M,  and  L  and  describe  the 
chrominance  of  the  scene;  these  are  transmitted  with  reduced  resolution,  thus  preserv¬ 
ing  information  deemed  more  important  during  evolutionary  development.  Broadcast 
transmission  of  color  video  signals  is  roughly  modeled  on  the  weighted  summing  of 
cone  signals  for  transmission  to  the  brain. 


In  digital  imaging,  the  three  raw  cone  signals  generated  by  the  human  visual 
system  can  be  represented  as  three  digital  images:  roughly  the  brightness  in  blue, 
green,  and  red  light.  The  weighted  summing  of  the  cone  signals  for  transmission  to  the 
brain  may  be  modeled  as  a  linear  operation  applied  to  the  3-element  vector  [R.  G.  B] . 
The  operation  is  implemented  by  an  invertible  3x3  matrix.  The  requirement  that 
the  matrix  be  invertible  ensures  that-  the  3-element-  output  vector  may  be  computed 
from  the  output  vector.  However,  note  that-  if  the  output  values  are  quantized,  as 
required  before  subsequent  digital  processing,  then  the  cascade  of  forward  and  inverse 
transformations  may  not-  yield  the  identical  triplet  of  [I?,  G.  B], 


The  variations  of  colors  of  the  objects  in  an  image  may  be  used  t-o  segment  a 
multispectral  image  /  [n,  to,  A,].  To  successfully  segment  the  “t-ree”  pixels  from  the 
color  image,  it-  is  necessary  t-o  use  the  multispectral  information  simultaneously.  One 
way  is  t-o  use  a  multidimensional  (2-D  or  3-D)  histogram  generated  from  two  or  three 
of  the  gray  values  at  each  pixel.  Often  only  two  colors  are  used  to  generate  a  2-D 
histogram  because  of  the  difficulty  of  displaying  three  dimensions  of  information  by 
conventional  means.  For  example,  pixels  having  a  particular  gray  level  fu  in  the  red 
image  and  a  level  fc  in  the  green  image  are  counted  in  bin  [//,>,  fc\ .  The  resulting 
matrix  is  the  image  of  a  2-D  feature  space,  i.e.,  the  bin  with  the  largest  number  of 
pixels  can  be  displayed  as  white  and  unpopulated  bins  as  black.  The  histogram  can 
be  used  for  image  segmentation  as  before,  but  the  extra  information  obtained  from 
the  second  color  usually  ensures  better  results. 
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15.10.1  2-D  Histograms  of  Simple  Color  Image 


Green 


Blue 


Red 


Red 


Green 


The  three  2-D  histograms  of  each  pair  of  channels  of  the  three  color  bands.  You 
should  be  able  to  identify  the  objects  in  each  of  the  histograms.  Four  clusters  can  be 
identified  in  the  R-G  and  five  in  the  G-B  histograms. 


These  2-D  histograms  (also  called  “scatterplots” )  of  pairs  of  grayscale  images  were 
plotted  using  the  Hypercube  Image  Analysis  Software  (available  free  for  many  plat¬ 
forms  from  http://www.tec.army.mil/Hypercube/). 

The  cluster  of  pixels  with  large  gray  values  in  all  images  due  to  the  white  pixels 
in  the  original  are  easily  identified  in  the  three  2-D  histograms.  You  should  be  able 
to  identify  the  clusters  that  belong  to  the  house,  tree,  sky,  etc.  We  can  segment  the 
image  by  thresholding  those  pixels  within  certain  intervals  of  red,  green,  and  blue  (to 
white,  say),  and  thresholding  the  others  to  black. 

Note  that  histogram  concept  can  be  extended  easily  to  more  than  three  dimen¬ 
sions,  though  visual  representation  is  more  difficult.  This  is  the  basis  for  multispectral 
segmentation  in  many  areas  of  image  processing. 


Segmentation  from  3-D  Histogram 

It  is  often  easier  to  identify  distinct  and  well-separated  clusters  from  the  multidimen¬ 
sional  histogram  rather  than  from  the  individual  images.  The  segmented  image  of 
the  tree  obtained  from  the  3-D  histogram  is: 
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Segmention  of 


“tree”  pixels  directly  from  the  3-D  histogram.  The  black  pixels  satisfy 
the  three  constraints:  r  >  106,  g  >  195,  and  b  <  167. 


15.10.2  Principal  Component  Analysis  —  PCA 

Reference:  Schowengerdt,  Remote  Sensing 

The  information  in  the  different  bands  of  a  multispectral  image  (e.g.,  an  RGB  color 
image)  often  are  highly  correlated,  meaning  that  some  or  many  bands  may  be  visually 
similar.  In  other  words,  the  information  content  of  a  multispectral  image  often  is  quite 
“redundant.”  It  often  is  convenient  to  reduce  or  even  eliminate  this  redundancy  by 
expressing  the  image  in  terms  of  “axes”  other  than  the  original  “bands.”  The  data 
image  is  transformed  to  a  new  coordinate  system  by  rigidly  rotating  axes  to  align 
with  these  other  “directions”  and  the  image  data  then  projected  onto  these  new  axes. 
In  principal  components,  the  rotation  produces  a  new  multispectral  image  with  a 
diagonal  covariance  matrix,  so  that  there  is  no  covariance  between  the  various  bands. 
One  axis  of  the  rotated  coordinate  system  is  aligned  with  the  direction  of  maximum 
variance  of  the  image,  the  second  axis  is  perpendicular  to  the  first,  and  aligned  with  the 
direction  of  the  second  largest  variance,  etc.  The  bands  in  the  principal  component 
image  are  thus  arranged  in  order  from  largest  to  smallest  variance. 

To  illustrate,  consider  the  principal  components  of  a  2-band  image  created  from 
the  blue  and  green  bands  of  the  simple  color  image;  the  values  in  the  red  band  were 
replaced  with  zeros.  The  2-D  histogram,  blue  vs.  green,  is  shown  to  locate  the 
(approximate)  axes  of  the  principal  components,  that  were  evaluated  using  “Hyper¬ 
cube.”  Since  the  third  component  image  was  black,  only  two  principal  components  are 
needed  to  fully  represent  the  image.  In  the  outputs,  note  that  the  “lightest”  objects 
in  the  image  (the  white  clouds  and  door)  also  are  lightest  in  the  first  PC.  The  darkest 
structure  in  both  channels  is  the  house,  which  appears  “black”  in  the  1st  PC.  The 
gray  values  are  projected  onto  the  orthogonal  axis  in  the  second  PC  image.  The  gray 
values  of  the  red  house  pixels  and  the  white  pixels  belonging  to  the  clouds  and  door 
are  projected  to  the  same  “mid-gray”  vlue,  and  thus  are  indistinguishable  in  the  2nd 
PC  (the  door  has  “disappeared  into  the  house”). 
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G-B  Image 
(Red  Band  to  Black) 


Red 


Green 


Blue 


= 

5 


Axes  of 
Principal 
Components 


A,j=  1696  X2  =  445  A,3  =  0 

Principal  components  of  a  two-band  image,  computed  in  “Hypercube”:  the  image  is 
created  by  inserting  zeros  in  the  red  band.  The  2-D  histogram  blue  vs.  green  is 
shown.  The  first  principal  component  projects  the  gray  values  onto  the  “cyan”  axis 
that  includes  most  of  the  image  variance.  The  second  principal  component  projects 
the  data  onto  the  magenta  axis.  The  images  are  shown.  Note  that  the  first  principal 
component  shows  the  clouds  and  door  as  white  and  the  house  as  black.  The  door  is 
not  visible  in  the  second  principal  component,  as  its  gray  value  is  the  same  as  that  of 

the  house. 


The  three  PCs  of  the  complete  RGB  image  also  are  shown  along  with  the  three 
eigenvalues.  The  first  PC  of  the  3-D  scene  has  a  very  dark  “sky”  because  it  exhibits 
the  largest  contrast  relative  to  the  white  objects  (clouds  and  door).  The  3rd  PC 
shows  the  smallest  range  of  variance,  which  is  dominated  by  image  noise. 
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X1=2673  X2=1299  X3=199 

The  three  principal  components  of  the  RGB  house+tree  image.  Note  that  the  third 

component  image  is  quite  noisy. 


15.11  Time-Sequence  Images:  Video 


The  other  common  example  a  3-D  image  plots  time  on  the  third  image  axis.  Perhaps 
the  most  obvious  use  is  in  motion  pictures,  where  the  different  frames  of  the  movie  are 
the  time  samples  of  the  3-D  spatial  image  f  [x,  y.  tn] .  The  illusion  of  continuous  motion 
is  created  because  of  the  photochemical  response  of  the  rods  and  cones  in  the  retina. 
The  time  duration  of  the  process  is  the  source  of  the  phenomenon  of  “persistence  of 
vision.”  The  persistence  is  shorter  if  image  is  brighter  (movie  projectors  have  rotary 
shutters  that  show  each  frame  two  or  even  three  times).  Movies  were  originally  taken 
at  16  frames  per  second,  later  increased  to  24  fps  in  US.  This  is  the  reason  why  motion 
in  old  movies  is  too  fast.  The  frame  rate  in  Europe  )  is  25  fps,  related  to  the  AC 
frequency  of  rate  is  50  Hz.  For  this  reason,  American  movies  shown  in  Europe  finish 
more  quickly  than  they  do  in  the  US. 

The  second  most  familiar  example  of  time-sampled  imagery  is  video,  where  the 
3-D  scene  /  [x,  y,  t]  is  sampled  in  both  time  and  space  to  convert  the  scene  to  a  1-D 
function  of  time  s  [t] .  It  took  50  or  more  years  to  develop  the  hardware  to  scan  scenes. 
The  first  systems  were  mechanical,  based  either  on  a  system  that  scanned  the  light 
reflected  from  an  illuminated  scene,  or  an  illumination  system  that  scanned  a  beam 
of  light  over  the  scene  and  collected  the  reflected  light.  One  of  the  primary  developers 
of  mechanically  scanned  video  was  the  Scotsman  John  Logie  Baird.  The  hardware 
of  electronic  scanning  was  developed  through  the  efforts  of  such  “illuminaries”  as  the 
American  Philo  T.  Farnsworth,  who  demonstrated  a  working  video  system  in  the 
1920s. 

Video  systems  commonly  use  an  “interlaced  scan”  that  alternates  scans  of  the 
even  and  odd  lines  of  the  full  frame,  so  that  the  eye  is  presented  with  half  of  the 
image  information  every  1/60  s.  This  is  less  objectionable  to  the  eye  than  the  original 
“progressive-scanning”  systems  that  presented  a  full  frame  every  1/30  s. 

Note  that  lines  number  248  to  263  and  511  to  525  are  typically  blanked  to  provide 
time  for  the  beam  to  return  to  the  upper  left  hand  corner  for  the  next  scan;  other 
signals  (such  as  closed  captioning  or  the  second  audio  program)  are  transmitted  during 
this  “flyback”  time. 
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Scan  Odd  Lines 
(262 !4) 


Retrace  Scan  Even  Lines 

(2621/2) 


Retrace 


Figure  15.1:  Interlaced  format  of  NTSC  video  raster:  one  frame  in  1/30  second  is 
composed  of  two  fields,  each  taking  1/60  second.  The  first  “field”  includes  262.5  odd 
lines  and  the  second  “field”  has  262.5  even  lines.  Lines  248-263  in  the  first  field  and 
lines  511-525  in  the  second  field  are  “ blanked ”  -  this  is  the  “retrace”  time  for  the 
CRT  beam  and  is  the  time  when  additional  information  (e.g.,  closed  captioning)  is 
transmitted. 


15.11.1  Color-Space  Transformations  for  Video  Compression 

References: 

Pratt,  Digital  Image  Processing,  Second  Edition,  §2,3 

Falk,  Brill,  Stork,  Seeing  the  Light,  §10 

Glassner,  Principles  of  Digital  Image  Synthesis,  §1 

Particular  color  transformations  have  been  developed  for  use  in  many  different 
applications,  including  various  schemes  for  image  transmission.  Consider  the  trans¬ 
formation  used  in  the  color  video  standard  in  North  America,  that  was  developed 
by  the  National  Television  Standards  Committee  (NTSC),  which  converts  RGB  val¬ 
ues  to  three  other  channels  by  a  linear  transformation:  the  “luminance”  Y  (used  by 
“black-and-white”  receivers). and  two  “chrominance”  channels  /  and  Q  via: 


0.299  0.587  0.114 

R 

Y 

0.596  -0.274  -0.322 

G 

= 

I 

0.211  -0.523  0.312 

B 

_ Q _ 

Note  that  the  sum  of  the  weights  applied  to  the  luminance  channel  Y  is  0.299+0.587+ 
0.114  =  1.0,  while  the  sums  of  the  weights  of  the  two  chrominance  channels  are  both 
0.  In  other  words,  the  luminance  channel  is  a  weighted  sum  of  RGB  (analogous 
to  an  integral),  while  the  chrominance  channels  are  weighted  differences  (similar  to 
derivatives).  In  the  context  of  linear  systems,  Y  is  the  result  of  “spectral  lowpass 
filtering,”  while  /  and  Q  are  the  outputs  of  what  may  be  loosely  described  as  “spectral 
highpass  filters.” 

If  the  input  11.  G.  and  B  are  in  the  range  [0,255],  so  will  be  the  range  of  Y. 
Both  chrominance  channels  of  any  gray  input  pixel  (where  R  =  G  =  B)  is  zero,  and 
the  range  of  allowed  chrominances  is  bipolar  and  fills  the  available  dynamic  range, 
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e.g.,  —103  <  I  <  152, +133  <  Q  <  —122,  totaling  256  levels  for  8-bit  RGB  inputs. 
The  positive  polarity  of  /  is  reddish  (often  described  as  “orange”),  and  its  negative 
polarity  is  “green+blue”  or  cyan;  hence  the  /  information  is  sometimes  called  the 
“orange-cyan”  axis.  The  positive  polarity  of  Q  is  “red+blue”  or  purple,  and  the 
negative  polarity  is  green,  so  the  Q  information  is  the  “purple-green”  axis. 

The  transformation  of  a  “mid-gray”  pixel  where  the  red,  green,  and  blue  images 
are  identically  a  is: 


0.299  0.587  0.114 

a 

a 

0.596  -0.274  -0.322 

a 

= 

0 

0.211  -0.523  0.312 

a 

0 

so  that  the  luminance  is  the  gray  value  of  the  three  colors  and  the  two  chrominance 
channels  are  both  zero. 

The  transformation  from  YIQ  back  to  RGB  is  the  inverse  of  the  forward  matrix 
operator: 


0.299 

0.587 

0.114 

-l 

1.0 

0.956 

0.621 

0.596 

-0.274 

-0.322 

r^j 

1.0 

-0.273 

-0.647 

0.211 

-0.523 

0.312 

1.0 

-1.104 

1.701 

1.0  0.956  0.621 

Y 

R 

1.0  -0.273  -0.647 

I 

= 

G 

1.0  -1.104  1.701 

_ Q _ 

B 

(rounded) 


Note  that  the  R ,  G ,  and  B  channels  all  include  100%  of  the  luminance  channel  Y. 

Other  color  transformations  are  used  in  other  video  systems.  The  two  common 
systems  used  in  Europe  and  Asia  are  PAL  (“Phase  Alternation  by  Line”)  and  SEC  AM 
(“Systeme  Electronique  Couleur  Avec  Memoire”).  Each  broadcasts  625  lines  at  25 
frames  per  second  and  uses  the  YUV  triad  of  luminance  and  chrominance,  where  the 
luminance  is  the  same  combination  of  RGB  but  the  chrominance  channels  are  slightly 
different.  The  RGB  transformation  to  YUV  is: 


0.299  0.587  0.114 

R 

Y 

-0.148  -0.289  0.437 

G 

= 

U 

0.615  -0.515  -0.100 

B 

V 

The  luminance  calculations  for  YIQ  and  YUV  are  identical.  The  ranges  of  allowed 
chrominances  are  bipolar:  for  8-bit  RGB  inputs  —144  <U<  111  and  —98  <V<  157 
(each  with  256  levels).  The  NTSC  selected  the  YIQ  standard  over  YUV  because  tests 
indicated  that  more  I-Q  data  could  be  discarded  without  affecting  the  subjective 
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image  quality. 


Y  U  V 

Color  image  decomposition,  all  images  displayed  with  full  dynamic  range  (i.e., 
maximum  possible  gray  value  is  mapped  to  white  and  minimum  to  black):  first  row 

RGB;  second  row  YIQ;  third  row  YUV. 

Obviously  there  exists  an  infinite  number  of  invertible  3x3  mat-rice  transforma¬ 
tions,  and  thus  of  invertible  color  transformations.  The  3-D  histograms  of  a  particular 
input  image  before  and  after  transformation  generally  will  differ.  Therefore  segmen¬ 
tation  of  objects  with  particular  similar  colors  likely  will  be  easier  or  more  successful 
in  a  particular  color  space. 
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15.11.2  Multiple-Frame  Averaging 


Consider  a  series  of  video  or  movie  images  of  an  invariant  (i.e.,  stationary)  object 
/  [x,  y\  corrupted  by  additive  noise  which  changes  from  pixel  to  pixel  and  frame  to 
frame: 

g  [x,  y,  U ]  =  /  [x,  y]  +  n  [ x ,  y,  U] 

where  n  [ x ,  y.  t,]  is  a  random  number  selected  from  a  Gaussian  distribution  with  p  =  0. 
The  additive  noise  will  tend  to  obscure  the  “true”  image  structure  /  [a:,  y\.  One  com¬ 
mon  problem  in  image  processing  is  to  enhance  the  visibility  of  the  invariant  objects 
in  the  image  by  attenuating  the  noise.  If  the  gray  values  n  are  truly  random,  i.e.,  all 
values  of  n  in  the  range  (— oo,  oo)  can  exist  with  equal  likelihood,  then  little  can  be 
done  to  improve  the  image.  Fortunately,  in  realistic  imaging  problems  the  probability 
of  each  value  of  n  is  determined  by  some  probability  density  function  (histogram)  p  [n] 
and  we  say  that  the  noise  is  stochastic.  The  most  common  probability  distributions 
in  physical  or  imaging  problems  are  the  uniform,  Poisson,  and  normal.  Less  common, 
but  still  physically  realizable,  distributions  are  the  Boltzmann  (negative  exponential) 
and  Lorentzian.  A  general  discussion  of  stochastic  functions  is  beyond  the  scope  of 
the  immediate  discussion,  though  we  will  go  into  more  detail  later  while  reviewing 
statistical  filters.  Interested  readers  should  consult  Frieden’s  book  Probability,  Sta¬ 
tistical  Optics,  and  Data  Testing  ( Springer- Ver lag,  1991)  for  detailed  discussions 
of  physically  important  stochastic  processes.  At  this  point,  we  will  state  without 
proof  that  the  central  limit  theorem  determines  the  most  common  probability  den¬ 
sity  function  in  physical  problems  is  the  normal  distribution  N  [//,  a2].  The  histogram 
of  noise  gray  values  in  the  normal  distribution  is: 

=  A  (/i,  a2) 


p  [n  = 


V2 


:  exp 


7RTZ 


2a2 


The  normal  distribution  is  completely  characterized  by  two  parameters:  the  mean 
value  //  and  the  variance  a2  (or  its  equivalent,  the  standard  deviation  a).  This  is  the 
equation  for  the  familiar  bell  curve  with  maximum  value  located  at  n  =  //  and  with 
full  width  of  2(j  measured  at  approximately  20%  of  the  maximum  amplitude.  In  the 
special  case  where  the  mean  value  p  =  0,  the  normal  distribution  commonly  is  called 
a  Gaussian  distribution.  The  remainder  of  the  discussion  in  this  section  will  assume 
that-  the  additive  noise  has  been  selected  at  random  from  a  normal  distribution. 


It  is  probably  clear  intuitively  that  an  image  created  by  averaging  a  collection  of 
noise  images  n  [x,  y,  U]  over  time  will  tend  toward  a  uniform  image  whose  gray  level  is 
the  mean  p  of  the  noise,  i.e.,  the  variations  in  gray  level  about  the  mean  will  “cancel 


out”: 


1 

N 


N- 1 

Y  n  [x,  y.ti]  =  p  ■  1  [x,  y] 

i=  1 


If  the  sequence  of  input  images  includes  an  invariant  object  on  a  background  of 
additive  normal  noise,  the  visibility  of  the  object  will  be  enhanced  in  the  average 
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image: 


1  N- 1  7V-1  AT— 1 

XI  (/  [®.  2/]  +  n  [®>  yM)  =  ]yE^a:’2/l  +  ]vE”  2/-^] 

i=l  i=l  i=l 

=  /  [®,  2/]  +  V  •  1  [®,  2/] 

This  result  is  proven  directly  below,  but  may  be  seen  more  easily  if  the  reader  is 
familiar  with  the  statistical  concept  that  the  probability  density  function  of  the  sum 
of  N  random  variables  is  the  TV-fold  convolution  of  the  individual  probability  density 
functions. 


To  quantify  the  visibility  of  the  object  in  a  noisy  image,  it  is  necessary  to  quantify 
the  visibility  of  the  noise,  i.e.,  the  variability  of  gray  level  due  to  the  stochastic  signal. 
The  average  gray  value  due  to  noise  is: 

/+oo 

n  [x,y]  p  [n]  dn  =  p 

-OO 


where  (  )  denotes  the  averaged  value  of  the  quantity  and  p  [n]  is  the  probability  density 
function  of  the  noise.  The  mean  value  obviously  is  not  an  appropriate  measure  because 
it  does  not  describe  the  variability  of  gray  value.  A  useful  quantity  is  the  variance  of 
the  noise  which  describes  the  average  of  the  difference  between  the  square  of  the  gray 
value  due  to  the  noise  and  the  mean: 


f+OO 


a2  [n]  =  ((n  —  p)  )  =  /  (n  —  p)  p  [n]  dn 


»+oo 


( n 2  —  2 np  +  /i2)  p  [n]  dn 


-OO 

"+OO 


r*+00 


^+00 


n2  p  [n]  dn  —  2 p  n  p  [n]  dn  +  p2 

J  —  OO 

n2  p  [n]  dn  —  2 p  ■  p  +  p2  ■  1 
n")  —  2  p2  +  p2 


p  [n]  dn 


1  — OO 

/•+oo 


-oo 

.2\ 


a2  [n]  =  (n2)  —  p2 

A  measure  of  the  relative  visibility  of  the  signal  and  the  noise  is  the  ratio  of  the  signal 
to  the  noise  variance,  and  is  often  called  the  signal-to- noise  ratio  (S/N  or  SNR).  It 
may  be  expressed  as  power  or  amplitude: 


Power  SNR 


f 2  [x,  y\ 

a2  fnl 


/  [x,  y } 


Amplitude  SNR  = 


c 7  \n 
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After  averaging  P  frames  of  signal  plus  noise,  the  gray  level  at  the  pixel  [x,  y]  will 
approach: 


9  [x,  y\ 


i=  1 


(. f[x,y\  +n[x,y,ti] ) 


/  [x,  y\ 
P 


n  [ x ,  y,  U 
P 


=  f  [x,  y]  +  y 


The  variance  of  the  noise  at  pixel  [a:,  y\  after  averaging  P  frames  is: 


2  r  1  /  2  r  1  \  2  ( n  2 

f7  [n]  =  (: n  [x,  y\)  -  y  =2^1  - ~5 - )  “  h 

i= 1  h  ' 

1  P  P 

=  ~pi  Y n  *<]  Y n  y ’  ~  y2 

i= 1  j= 1 

p  2  p 

=  p2  X]  (n  [t  y>  U}f  +  ^Yn  [x,  y,  ti]  •  n  [ x ,  y,  tj\  -  /i2 

2=1  i>j 


If  the  noise  values  are  selected  from  the  same  distribution,  all  terms  in  the  first  sum 
on  the  right  are  identical: 


jpi  Y {n  ^  ^])2  =  ^Y  K2 + y2) 

i= 1  i>j 

=  T(p.p?  +  p)) 


P 

Because  the  noise  values  are  uncorrelated  by  assumption,  the  second  term  on  the 
right  is  just  the  square  of  the  mean: 


1 


p 

^2(n[x,y,ti]f 

i= 1 
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p 

Y  n  [x,  y,  ti ]  •  n  [x,  y,  tj] 
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The  variance  of  the  average  image  is: 


07 


f.r 


P 


07 


P 


If  the  mean  value  of  the  noise  is  /i  =  0  (Gaussian  noise),  then  the  variance  of  the  sum 
is  reduced  by  a  factor  of  P  and  standard  deviation  is  reduced  by  \/P  : 


a  n  = 


N 

P 


1 


a  [n]  =  y/ CT2  \n]  =  -j=Oi  [n] 
The  amplitude  SNR  of  the  averaged  image  is: 

/  [x,  y ]  _  f[x,y]  =y/pm  f  [t  v\ 


SNRnvt  = 


\J a2  \n 


<?i  W 

Vp 


=  \fp  ■  SNRir 


(7i 


SN Rout  =  s/P  ■  SNR, 


Thus  the  effect  of  averaging  multiple  frames  which  include  uncorrelated  additive  noise 
from  a  Gaussian  distribution  is  to  decrease  the  width  of  the  histogram  of  the  noise  by 

a  factor  of  ,  which  increases  the  signal-to-noise  ratio  of  the  image  by  \J~P .  For 

example,  a  video  image  from  a  distant  TV  station  often  is  contaminated  by  random 
noise  (“snow”).  If  the  image  is  invariant  over  time,  its  signal-to-noise  ratio  can  be 
increased  by  averaging;  if  90  frames  of  video  (=  3  sec)  are  averaged,  the  SNR  of  the 
output  image  will  increase  by  a  factor  of  \/90  =  9.5. 


If  the  noise  is  correlated  to  some  degree  from  frame  to  frame  (i.e.,  its  spatial 
structure  is  partially  correlated  from  image  to  image),  then  averaging  will  not  improve 
the  SNR  so  rapidly.  For  example,  consider  imaging  of  a  submerged  object  through  a 
water  surface.  Wave  motion  will  distort  the  images  but  there  will  be  some  correlation 
between  frames.  The  SNR  might  improve  only  as,  say,  P~'  =  3  for  P  =  90  frames. 
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Averaging  of  independent  noise  samples:  (a)  signal  f  [#];  (b)  one  realization  of 
Gaussian  noise  rq  [a;]  with  /i  =  0  and  a  =  1;  (c)  f  [x]  +  rq  [x] ;  (d) 

|  Si=i  (/  M  +  ni  [&])>  showing  the  improved  signal-to-noise  ratio  of  the  averaged 

image. 


15.11.3  Required  Number  of  Bits  for  image  Sums,  Averages, 
and  Differences 


If  two  8-bit  images  are  added,  then  the  gray  value  of  the  output  image  lies  in  the 
interval  [0,  510],  with  511  gray  values  requiring  9  bits  of  data  to  represent  fully.  If  con¬ 
strained  to  8  useful  bits  of  data  in  the  output  image,  half  of  the  gray-scale  variations 
data  in  the  summation  must  be  discarded.  In  short,  it  is  necessary  to  requantize  the 
summation  image.  If  two  8-bit  images  are  averaged,  then  the  gray  values  of  the  result¬ 
ing  image  are  in  the  interval  [0,  255],  but  half-integer  values  are  virtually  guaranteed, 
thus  ensuring  that  the  average  image  also  has  9  bits  of  data  unless  requantized.  The 
central  limit  theorem  indicates  that  the  histogram  of  a  summation  or  average  image 
should  approach  a  Gaussian  form 
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15.12  Image  Subtraction  for  Change  Detection 

Subtracting  images  of  the  same  scene  recorded  at  different  times  will  highlight  pixels 
whose  gray  value  has  changed  in  the  interval: 


9[x,y\  =  f[x,y,t i]  -f[x,y,t0] 


1  brightened  | 

>  will  have  < 

j  positive  j 

j  dimmed  j 

I  1 

1  negative  j 

Invariant  pixels  will  subtract  to  0,  pixels  that  have 


gray  level.  This  technique  is  applied  to  motion/change  detection  and  may  be  inter¬ 
preted  as  a  discrete  version  of  the  time  derivative. 

df  [x,  y,  t]  =  f  [ x ,  y,t  +  At]  —  f  [ x ,  y,  t] 

dt  A™o  At 


For  multitemporal  digital  images,  the  smallest  nonvanishing  time  interval  At  is  the 
interval  between  frames  (At  =  t\  —t 0).  The  time  derivative  is  the  difference  image  of 
adjacent  frames: 


df  [x,  y,  t] 
dt 


*o)]  -  f  [x,  y ,  to]  f  [x,  y,  ti]  -  f  [x,  y,  t0] 


/  [x,  y,  ti]  -  f  [x,  y,  t0] 

Note  that  the  difference  image  is  bipolar,  white  >  0,  gray  =  0,  black  <  0.  The 
difference  image  g\x,y]  is  bipolar  and  must  be  scaled  to  fit  the  available  discrete 
dynamic  range [0,  255]  for  display.  Often  r/0  =  0  is  mapped  to  the  mid- level  gray  (e.g., 
127),  the  maximum  negative  level  is  mapped  to  0,  the  brightest  level  to  255,  and  the 
intervening  grays  are  linearly  compressed  to  fit. 


Awh] 


./Kv,o] 

t2  &  t.,  +  3  sec. 


Ax,y,t2]-Ax,y,ti\ 


Extrema  =>  translations 


Difference  of  two  images  of  the  same  scene  taken  approximately  3  seconds  apart. 

The  images  are  from  a  movie  taken  by  James  Noel  during  the  Mallory  Everest 
Expedition  of  192f.  The  one  image  has  been  translated  by  a  small  distance,  thus 
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creating  “edge”  regions.  Differences  of  zero  are  mapped  to  “midgray”,  while  regions 
where  the  second  image  is  brighter  or  darker  map  to  white  and  black,  respectively. 


In  this  example,  the  difference  image  may  be  used  to  help  determine  how  well  regis¬ 
tered  the  two  images  are;  if  perfectly  registered,  only  pixels  that  have  really  changed 
between  images  will  have  gray  values  other  than  0. 


15.13  Difference  Images  as  Features 

In  feature  extraction  and  recognition  applications  (e.g.  remote  sensing),  linear  com¬ 
binations  (i.e.  sums  and  differences)  of  multispectral  imagery  may  supply  additional 
useful  information  for  classification.  For  example,  the  difference  image  of  spectral 
band  3  (red)  and  4  (infrared)  (out  of  a  total  of  seven  bands)  imaged  by  LANDSAT 
helps  classify  urban  vs.  rural  features  in  the  image.  In  the  simple  house-tree  image, 
the  visibility  of  the  house  is  enhanced  in  Red-Green  and  Blue- Red  images. 


4?  % 

Green  -  Red  Red  -  Blue  Blue  -  Green 

Color  difference  images:  the  “house”  is  more  obvious  in  the  green-red  image,  and 

the  tree  in  the  red-blue  image. 

Note  that  the  difference  images  are  noticeably  noisier,  especially  Blue-Green;  differ¬ 
ence  images  enhance  all  variations  in  gray  value,  whether  desired  or  not.  The  concerns 
about  displaying  bipolar  gray  values  of  time  derivatives  exist  here  as  well. 

15.13.1  Number  of  Bits  in  Difference  Image 

If  two  8-bit  images  (with  gray  values  fn  satisfying  the  constraint  0  <  fn  <  255) 
are  subtracted,  then  the  gray  values  of  the  resulting  image  g  [x,  y]  lie  in  the  range 
—255  <  g  <  +255,  for  a  total  of  511  possible  values  requiring  9  bits  of  data.  To 
obtain  8  useful  bits  of  data  in  the  output  image,  half  of  the  data  in  the  difference 
image  must  be  discarded  (usually  the  least-significant  bit,  though  if  the  pixels  are 
well  correlated,  then  the  most-significant  bit  may  be  zero). 

The  principles  of  statistical  distributions  tell  us  that  the  difference  of  images  with 
data  generated  from  two  statistical  distributions  is  the  convolution  of  the  two  distri¬ 
butions  centered  about  the  difference  in  the  mean  values. 
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15.14  “Mask”  or  “Template”  Multiplication 

Pixel-by-pixel  multiplication  is  an  essential  part  of  local  neighborhood  and  global 
iamge  operations,  e.g.,  the  Fourier  transform.  It  also  is  useful  to  mask  out  sections  of 
an  image,  perhaps  to  be  replaced  by  objects  from  other  images.  This  is  occasionally 
useful  in  segmentation  and  pattern  recognition,  but  is  essential  in  image  synthesis, 
such  as  for  special  effects  in  movies. 


15.15  Image  Division 

Images  recorded  by  a  system  with  spatially  nonuniform  response  are  functions  of  both 
the  input  distribution  f[x,  y]  and  the  spatial  sensitivity  curve  s[a:,  y],  0  <  s[x,  y]  <  1  : 

9[x,y\  =  f[x,y\  •  s[x,y } 

This  is  a  deterministic  multiplicative  degradation  of  the  image;  the  image  may  be 
restored  by  dividing  out  the  noise.  An  estimate  of  the  true  image  brightness  can  be 
computed  at  each  pixel  by  division: 


f  r  1  d\pl  y\  f  f  1 

/  f,  y  =  -f — i  =  f  f»  y 

s[x,y]  -  - 

(n.b.,  no  image  information  is  recorded  at  pixels  where  s[x,y]  =  0  and  thus  the 
true  value  cannot  be  recovered  at  those  pixels).  This  technique  has  been  applied  to 
remotely  sensed  imagery  where  information  about  image  brightness  is  critical.  Note 
that  errors  in  the  sensitivity  function  greatly  distorts  the  recovered  signal.  Similarly, 
additive  noise  creates  big  problems  in  image  division. 


15.15.1  Image  Division  to  Correct  Spatial  Sensitivity 

Imaging  systems  often  suffer  from  a  consistent  multiplicative  error.  One  very  common 
example  is  the  variation  in  sensitivity  of  the  pixels  in  a  CCD  sensor  due  to  the  intrinsic 
variability  in  the  substrate  properties  or  manufacturing.  These  multiplicative  errors 
are  measured  and  corrected  via  a  subsequent  division. 

Consider  a  biased  sine  wave  /  [x]  recorded  by  an  imaging  system  whose  sensitivity 
falls  off  away  from  the  origin.  The  image  may  be  recovered  completely  if  the  sensitivity 
curve  is  nonzero  everywhere  and  there  is  no  noise  in  either  the  recorded  signal  or  the 
estimate  of  the  sensitivity. 
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(C)  (d) 


1  -D  simulation  of  spatial  compensation  for  sensitivity  correction:  (a)  original  signal 
f  [x]  is  a  biased  sinusoid;  (b)  sensitivity  function  s[x\;  (c)  g[x]  =  f  [a;]  •  s  [x] ;  (d) 

correction  f  [a;]  =  44 


Noise  in  the  estimate  of  the  sensitivity  results  in  distortion  of  the  recovered  signal; 
the  effect  is  more  severe  where  the  SNR  is  low.  The  deviation  of  the  added  noise  is 
0.005. 
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Spatial  correction  in  presence  of  noise:  (a)  g[x }  +  n[x],  where  n  [a:]  is  zero-mean 
Gaussian  noise  with  o  =  0.005;  (b)  f  [x]  =  ,  showing  the  large  errors  due  to 

incorrect  division  of  two  small  values. 

15.15.2  Image  Division  to  Enhance  Low-Contrast  Imagery 

Astronomers  would  often  need  to  examine  fine  structure  (i.e.  significant  brightness 
differences)  that  are  hidden  by  a  larger-scale  brightness  gradient  of  the  object.  In¬ 
cluded  among  the  significant  low-contrast  structural  features  are  streamers  in  the 
solar  corona  that  radiate  outward  from  the  solar  surface;  the  brightness  details  pro¬ 
vide  clues  about  the  physical  nature  of  the  solar  atmosphere  and  magnetic  fields. 
However,  the  radial  brightness  gradient  of  the  corona  makes  it  very  difficult  to  image 
the  full  length  of  the  coronal  streamers.  The  overall  dynamic  range  of  ground-based 
imagery  of  the  solar  corona  is  approximately  three  orders  of  magnitude  (limited  by 
atmospheric  sky  brightness).  Imagery  from  air-  or  spacecraft  may  add  an  order  of 
magnitude  for  a  total  dynamic  range  of  104.  This  may  be  recorded  on  a  wide-range 
photographic  negative  (A D  >  4),  but  it  is  not  possible  to  print  that  dynamic  range 
by  normal  photographic  techniques. 

The  problem  of  enhancing  the  visibility  of  small  changes  in  coronal  image  bright¬ 
ness  is  similar  to  correction  of  detector  sensitivity  just  considered  and  may  be  attacked 
by  digital  methods.  The  brightness  gradient  of  the  corona  is  analogous  to  the  1-D 
sensitivity  function  s  [x] ;  division  of  the  original  image  by  the  brightness  gradient 
“equalizes”  or  “flattens”  the  gray-level  variations  in  the  background,  thus  allowing 
the  smaller-scale  variations  across  the  image  to  be  displayed  on  a  device  with  limited 
dynamic  range.  An  estimate  of  the  coronal  brightness  gradient  may  be  determined  by 
averaging  fitted  curves  of  the  radial  brightness  or  by  making  a  low-resolution  (blurred) 
image.  The  latter  is  more  commonly  used,  as  it  may  be  derived  via  simple  local  neigh¬ 
borhood  digital  operations  to  be  considered  next.  The  recovered  image  is  the  ratio  of 
the  measured  high-resolution  image  and  the  image  of  the  brightness  gradient.  This 
technique  may  be  applied  to  archival  photographic  negatives  of  historical  eclipses  to 
provide  additional  information  about  the  history  of  solar  conditions  and  thus  their 
effects  on  the  earth’s  climate. 
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Examples  of  Image  Division  for  Local  Contrast  Enhancement 

Examples  of  recovered  information  from  old  imagery  of  a  solar  eclipse  and  a  comet  are 
shown.  The  pictures  are  from  Enhancement  of  Solar  Corona  and  Comet  Details  by 
Matuska,  et  al,  Proc.  SPIE,  119,  pp.  28-35,  1977  and  in  Optical  Engineering, 
17(6),  661-665,  1978.  The  images  are  scans  of  xerographic  prints  from  microfilm, 
which  is  the  reason  for  the  poor  quality  (courtesy  of  Wallace  Memorial  Library!) 


Average  of  four  “raw”  images  of  the  1973  solar  eclipse,  showing  the  corona  with 

little  discemable  detail. 


Image  of  solar  corona  obtained  by 
“pseudounsharp  masking,  ”  this  is  the 
ratio  of  of  the  original  average  of  four 
images  image  and  a  lowpass-filtered 
replica. 


Image  of  solar  corona  obtained  by 
dividing  the  original  image  by  a 
lowpass-filtered  replica  and  following 
with  highpass  filtering  to  “sharpen”  the 
fine  detail. 
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Original  image  of  comet  1957V 
(Mrkos) 


Enhanced  image  of  Comet  Mrkos  via 
“pseudounsharp  ”  masking. 


Chapter  16 

Local  Operations 


g[x,  y]  =  0{f[x  ±  Ax,  y  ±  Ay]} 

In  many  common  image  processing  operations,  the  output  pixel  is  a  weighted  com¬ 
bination  of  the  gray  values  of  pixels  in  the  neighborhood  of  the  input  pixel,  hence 
the  term  local  neighborhood  operations.  The  size  of  the  neighborhood  and  the  pixel 
weights  determine  the  action  of  the  operator.  This  concept  has  already  been  intro¬ 
duced  when  we  considered  image  prefiltering  during  the  discussion  of  realistic  image 
sampling.  It  will  now  be  formalized  and  will  serve  as  the  basis  of  the  discussion  of 
image  transformations. 


Schematic  of  a  local  operation  applied  to  the  input  image  f  \x,  y]  to  create  the  output 
image  g  [x,  y] .  The  local  operation  weights  the  gray  values  in  the  neighborhood  of  the 

input  pixel. 
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16.1  Window  Operators  —  Correlation 


You  probably  have  already  been  exposed  to  window  operations  in  the  course  on  linear 
systems.  An  example  of  a  window  operator  acting  on  the  1-D  continuous  function 

/  M  is: 

/+oo 

f[a]  7  [a-®]  da 

-OO 

The  resulting  function  of  x  is  the  area  of  the  product  of  two  functions  of  a:  the 
input  /  and  a  second  function  7  that  has  been  translated  (shifted)  by  the  distance  x. 
Different  results  are  obtained  by  substituting  different  functions  7  [x] . 

The  process  may  be  recast  in  a  different  form  by  defining  a  new  variable  of  inte¬ 
gration  u  =  a  —  x: 


f  [a]  7  [a 


®]  da 


r*W=+ OO 


f  [x  +  u\  7  [w]  du 


which  differs  from  the  first  expression  in  that  the  second  function  7  [u]  remains  fixed 
in  position  and  the  input  function  /  is  shifted,  now  by  —x.  If  the  amplitude  of  the 
function  7  is  zero  outside  some  interval  in  this  second  expression,  then  the  integral 
need  be  computed  only  over  the  region  where  7 [u]  7^  0.  The  region  where  the  function 
7  [x]  is  nonzero  is  called  the  support  of  7,  and  functions  that  are  nonzero  over  only  a 
finite  domain  are  said  to  exhibit  finite  or  compact  “support.” 


The  2-D  versions  of  these  expressions  are: 


0{f[x,y]}  = 


+  OO 


-00 

+00 


/  [a,  /?]  7  [a  —  x,  /3  —  y\  da  dj3 
f  [x  +  u,  y  +  v]  7  [u,  u]  du  dv. 


The  analogous  process  for  sampled  functions  requires  that  the  integral  be  converted 
to  a  discrete  summation: 


+OO  +OO 

g  [n,  m\=  f  [*’•?]  7  [*  -  n,j  -  m] 

i=— 00  j=—oo 
+00  +00 

=  Y  f[i+nij+m\  7  [i,j]- 

i=— 00  j=— 00 


In  words,  this  process  scales  the  shifted  function  by  the  values  of  the  matrix  7, 
and  thus  computes  a  weighted  summation  of  gray  values  of  the  input  image  /  [n,  m\ . 
The  operation  derined  by  this  last  equation  is  called  the  crosscorrelation  of  the  image 
with  the  window  funtion  7  [n,  m\ .  The  correlation  operation  often  is  denoted  by  a 
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five-pointed  star  (“pentagram”),  e.g., 

g  [n,  to]  =  f  [n,  to]  ★  [n,  to] 

+OO  +OO 

=  /Ml  T  [i-n,j-m\ 

i=— oo  j=— oo 

The  output  image  g  at  pixel  indexed  by  [n,  to]  is  computed  by  centering  the  window 
7  [n,  to]  on  that  pixel  of  the  input  image  /  [n,  to]  ,  multiplying  the  window  and  input 
image  pixel  by  pixel,  and  summing  the  products.  This  operation  produces  an  output 
extremum  at  shifts  [n,  to]  where  the  gray-level  pattern  of  the  input  matches  that  of 
the  window. 

In  the  common  case  where  the  sampled  function  7  is  zero  outside  a  domain  with 
compact  support  of  size  3x3  samples,  the  function  may  be  written  in  the  form  of  a 
3x3  matrix  or  window  function: 


7-pi 

7o,i 

7i,i 

7-i,o 

7o,o 

7i,o 

7-i, -1 

7o,-i 

7i,-i 

16.1.1  Examples  of  3  x  3  Crosscorrelation  Operators 

Consider  the  action  of  these  3x3  window  functions: 


71  [n,  to  = 


72  [n,  to  = 


71  [n,  to  = 


0 

0 

0 

0 

+1 

0 

0 

0 

0 

0 

0 

0 

0 

+2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

+1 

0 

0 

0 

•  71I7  the  only  pixel  that  influences  the  output  g  [n,  to]  is  the  identical  pixel  in  the 
input  /  [n,  to]  -  this  is  the  identity  operator. 

•  72-  the  output  pixel  has  twice  the  gray  value  of  the  input  pixel  -  this  is  a 
uniform  contrast  stretching  operator. 
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•  73-  the  output  pixel  is  identical  to  its  right-hand  neighbor  in  the  input  image  - 
this  operator  translates  the  image  one  pixel  to  the  left. 

Once  the  general  crosscorrelation  algorithm  is  programmed,  many  useful  opera¬ 
tions  on  the  image  /  [n,  m\  can  be  performed  simply  by  specifying  different  values  for 
the  window  coefficients. 


16.2  Convolution 


A  mathematically  equivalent  but  generally  more  convenient  neighborhood  operation  is 
the  convolution,  which  has  some  very  nice  mathematical  properties.  The  convolution 
of  two  1-D  continuous  functions,  the  input  /  [rc]  and  the  impulse  response  (or  kernel , 
or  point  spread  function ,  or  system  function )  h  [a;]  is: 

/OO 

da  f  [a]  h  [x  —  a]. 

-OO 


where  a:  is  a  dummy  variable  of  integration.  As  for  the  crosscorrelation,  the  function 
h  [a:]  defines  the  action  of  the  system  on  the  input  /  [ar] .  By  changing  the  integration 
variable  to  u  =  x  —  a,  an  equivalent  expression  for  the  convolution  is  found: 


g  [a;]  =  /  /  [a]  h  [x  —  a]  da 

f  [x  —  u]  h  [u]  (— du ) 
f  [x  —  u]  h  [u]  du 
h  [ck]  f  [x  —  a]  da 


'  — OO 

ru=+ 00 

'  u=-\-  OO 
poo 


1  — OO 

poo 


where  the  dummy  variable  was  renamed  from  u  to  a  in  the  last  step.  Note  that 
the  roles  of  /  [a;]  and  h  [a;]  have  been  exchanged  between  the  first  and  last  expres¬ 
sions,  which  means  that  the  input  function  /  [a;]  and  system  function  h  [a;]  can  be 
interchanged. 

The  convolution  of  a  continuous  2-D  function  f[x,  y]  with  a  system  function  h[x,  y] 
is  denoted  by  an  asterisk  and  defined  as: 


9  [x,  y}  =  f  [x,  y]  *  h  [. x ,  y\ 

f  [a,  (3]  h[x  —  a,  y  —  (3]  da  d/3 


-OO 

OO 


f  [x  —  a,y  —  /3\  h  [a,  (3]  da  d/3 


Note  the  difference  between  the  first  forms  for  the  convolution  and  the  crosscorrela- 
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tion: 


/  [x,  y]  ★  [a;,  y]  = 
f  [x,  y]  *  h  [x,  y]  = 
and  between  the  second  forms: 

/  [x,  y]  ★  [x,  y]  = 
f  [x,  y ]  *  h  [x,  y]  = 


-oo 

r»00 


/  [cc,  (3\  7  [a  —  x,  f3  —  y\  da  dj3 
f  [cc,  (3\  h[x  —  a,  y  —  fd]  da  dfd 


f  [x  +  u,  y  +  v]  7  [u,  n]  du  dv 
f  [x  —  a,y  —  /3\  h  [cp  fd\  da  df3 


The  changes  of  the  order  of  the  variables  in  the  first  pair  says  that  the  function  7 
is  just  shifted  before  multiplying  by  /  in  the  crosscorrelation,  while  the  function  h 
is  flipped  about  its  center  (or  equivalently  rotated  about  the  center  by  180°)  before 
shifting.  In  the  second  pair,  the  difference  in  sign  of  the  integration  variables  says 
that  the  input  function  /  is  shifted  in  different  directions  before  multiplying  by  the 
system  function  7  for  crosscorrelation  and  h  for  convolution.  In  convolution,  it  is 
common  to  speak  of  filtering  the  input  /  with  the  kernel  h.  For  discrete  functions, 
the  convolution  integral  becomes  a  summation: 


g  [n,  m]  =  /  [n,  777]  *  h  [ 77, ,  777] 

OO  OO 

=  /[bj]  h[n-i,m-j] . 

i=— 00  j=— 00 

Again,  note  the  difference  in  algebraic  sign  of  the  action  of  the  kernel  h  [n,  m]  in 
convolution  and  the  window  7^  in  correlation: 

OO  OO 

~m\ 

i=— 00  j=— 00 
00  00 

f  h  ^ f m  - j\  ■ 

i=— 00  j=— 00 

This  form  of  the  convolution  has  the  very  useful  property  that  convolution  of  an  input 
in  the  form  of  an  impulse  function  8cj  [77.,  777]  with  h  [n,  m]  yields  h  [ n ,  777] ,  hence  the 
name  for  h  as  the  impulse  response: 


/  [77.,  77?.]  ★  [77.,  777]  = 
/  [77.,  777]  *  h  [ 77. ,  777]  = 


Sd  [77.,  777]  *  h  [77, 777]  =  h  [77.,  777 
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where  the  discrete  Dirac  delta  function  dd  [n,  m\  is  defined: 


Sd  [ n ,  m 


1  if  n  =  m  =  0 
0  otherwise 


16.2.1  Evaluating  Discrete  Convolutions 


1-D 


Schematic  of  the  sequence  of  calculations  in  1-D  discrete  convolution.  The  3-pixel 
kernel  h  [n]  = 


1 


2 


3 


is  convolved  with  the  input  image  that  is  1  at  one  pixel 

and  zero  elsewhere.  The  output  is  a  replica  of  h  [n]  centered  at  the  location  of  the 

impulse. 
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2-D 
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Schematic  of  2-D  discrete  convolution  with  the  2-D  kernel  h  [n,  m  . 


S  [i  —  n,  j  —  m\*  h[n ,  m]  =  h  [n,  m] 

Examples  of  2-D  Convolution  Kernels 
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identity 


shifts  image  one  pixel  to  right 


Discrete  convolution  is  linear  because  it  is  defined  by  a  weighted  sum  of  pixel  gray 
values: 


+oo  +oo 

/  [n,  to]  *  (hi  [n,  to]  +  h2  [n,  to])  =  ^  ^  f[i,  j }  •  (hi  [n  -i,m-  j]  +  h2  [n  -i,m-  j]) 

i=—oo  j=—oo 

+oo  +oo 

=  £  £  (/  [b  j]  '  h  [n  -i,m-  j]  +  f  [i,  j ]  •  h2  [n  -  i,m-j ]) 

i=—oo  j=—oo 

+oo  +oo  +00  +00 

=  a  ■ hi  [n~hm~  j}+  f  [*>  j]  ■ h 2  [n  -hm-  j } 

i=—ooj=—oo  i=—ooj=— 00 
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/  [n,  to 

*  (hi  [n,  to 

+  h,2 

n,  to 

)  =  /  [n,  to 

*  h\ 

n,  to 

+  /  [n,  to 

*  /?-2 

n,  to] 

The  linearity  of  convolution  allows  new  kernels  to  be  created  from  sums  or  differences 
of  other  kernels.  For  example,  consider  the  sum  of  three  3x3  kernels  whose  actions 
have  already  been  considered: 


h  [n,  m 


The  output  image  g  [n,  to]  is  the  average  of  three  images:  the  input  and  copies  shifted 
one  pixel  up  and  down.  Therefore,  each  pixel  in  g  [n,  to]  is  the  average  of  three 
pixels  in  a  vertical  line;  g  [n,  to]  is  blurred  vertically.  Note  that  the  kernels  have  been 
normalized  so  that  the  sum  of  the  elements  is  unity.  This  ensures  that  the  gray  level 
of  the  filtered  image  will  fall  within  the  dynamic  range  of  the  input  image,  but  they 
may  not  be  integers.  The  output  of  a  lowpass  filter  must  typically  be  requantized. 


16.2.2  Convolutions  —  Edges  of  the  Image 

Because  a  convolution  is  the  sum  of  weighted  gray  values  in  the  neighborhood  of  the 
input  pixel,  there  is  a  question  of  what  to  do  near  the  edge  of  the  image,  i.e. ,  when 
the  neighborhood  of  pixels  in  the  kernel  extends  “over  the  edge”  of  the  imgae.  The 
common  solutions  are: 


1 .  consider  any  pixel  in  the  neighborhood  that  would  extend  off  the  image  to  have 
gray  value  “0” ; 

2.  consider  pixels  off  the  edge  to  have  the  same  gray  value  as  the  edge  pixel; 

3.  consider  that  the  convolution  in  any  such  case  to  be  undefined;  and 

4.  define  any  pixels  over  the  edge  of  the  image  to  have  the  same  gray  value  as 
pixels  on  the  opposite  edge. 


On  the  face  of  it,  the  fourth  of  these  alternatives  may  seem  to  be  ridiculous,  but 
it  is  simply  a  statement  that  the  image  is  assumed  to  be  periodic,  i.e.,  that: 

/  [n,  to]  =  /  [//  +  kN,  to  +  £M\ 

where  N  and  M  are  the  numbers  of  pixels  in  a  row  or  column,  respectively,  and  k.£ 
are  integers.  In  fact,  this  is  the  most  common  case,  and  will  be  treated  in  depth  when 
global  operators  are  discussed. 
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Possible  strategies  for  dealing  with  the  edge  of  the  image  in  2-D  convolution:  (a)  the 
input  image  is  padded  with  zeros;  (b)  the  input  image  is  padded  with  the  same  gray 
values  “on  the  edge;”  (c)  values  “off  the  edge”  are  ignored;  (d)  pixels  off  the  edge  are 
assigned  the  values  on  the  opposite  edge,  this  assumes  that  the  input  image  is 

periodic. 

The  3  x  3  image  /  \n,  rn\  is  outlined  by  the  bold-face  box  and  the  assumed  gray 
values  of  pixels  off  the  edge  of  the  image  are  shown  in  light  face  for  four  cases;  the 
presence  of  an  “x”  in  a  convolutio  kernel  indicates  that  the  output  gray  value  is 
undefined. 

16.2.3  Convolutions  —  Computational  Intensity 

Evaluating  convolutions  with  large  kernels  in  a  serial  processor  used  to  be  very  slow. 
For  example,  convolution  of  a  5122-pixel  image  with  an  M  x  M  kernel  requires: 
2-5122-  M2  operations  (multiplications  and  additions)  for  a  total  of  4. 7 TO6  operations 
with  a  3  x  3  kernel  and  25.7  •  106  operations  with  a  7  x  7  (these  operations  generally 
are  performed  on  floating-point  data).  The  increase  in  computations  as  M 2  ensures 
that  convolution  of  large  images  with  large  kernels  is  not  very  practical  by  serial 
brute-force  means.  In  the  discussion  of  global  operations  to  follow,  we  will  introduce 
an  alternative  method  for  computing  convolutions  via  the  Fourier  transform  that 
requires  many  fewer  operations  for  large  images. 

16.2.4  Smoothing  Kernels  —  Lowpass  Filtering 

If  all  elements  of  a  convolution  kernel  have  the  same  algebraic  sign,  then  the  operator 
O  sums  weighted  gray  values  of  input  pixels  in  the  neighborhood;  if  the  sum  of  the 
elements  is  one,  then  the  process  computes  a  weighted  average  of  the  gray  values. 
Averaging  reduces  the  variability  of  the  gray  values  of  the  input  image;  it  “smooths” 
the  function: 
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Local  averaging  decreases  the  “variability”  (variance)  of  pixel  gray  values 

For  a  uniform  averaging  kernel  of  a  fixed  size,  functions  that  oscillate  over  a  period  just 
longer  than  the  kernel  (e.g.,  short-period,  high-frequency  sinusoids)  will  be  averaged 
more  than  slowly  varying  terms.  In  other  words,  local  averaging  attenuates  the  high 
sinusoidal  frequencies  while  passing  the  low  frequencies  relatively  undisturbed  -  local 
averaging  operators  are  lowpass  filters.  If  the  kernel  size  doubles,  input  sinusoids 
with  twice  the  period  (half  the  spatial  frequency)  will  be  equivalently  affected.  This 
action  was  discussed  in  the  section  on  realistic  sampling;  a  finite  detector  averages 
the  signal  over  its  width  and  reduces  modulation  of  the  output  signal  to  a  greater 
degree  at  higher  frequencies. 


Local  averaging  operators  are  lowpass  filters 

Obviously,  averaging  kernels  reduce  the  visibility  of  additive  noise  by  spreading  the 
difference  in  gray  value  of  noise  pixel  from  the  background  over  its  neighbors.  By 
analogy  with  temporal  averaging,  spatial  averaging  of  noise  increases  SNR  by  the 
square-root  of  the  number  of  pixels  averaged  if  the  noise  is  random  and  the  averaging 
weights  are  identical. 


The  action  of  an  averager  can  be  directional: 


hi  [n,  m 


I12  [n,  m 


/13  [n,  m 


The  “rotation”  or  “reversal”  of  the  convolution  kernel  means  that  the  action  of 
//,3  [n,  m\  blurs  diagonally  along  the  direction  at  90°  that  in  the  kernel. 
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The  elements  of  an  averaging  kernel  need  not  be  identical,  e.g., 
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averages  over  the  entire  window  but  the  output  is  primarily  influenced  by  the  center 
pixel;  the  output  blurred  less  than  in  the  case  when  all  elements  are  identical. 

Other  2-D  discrete  averaging  kernels  may  be  constructed  by  “orthogonal  multipli¬ 
cation,”  e.g.,  we  can  construct  the  common  3x3  uniform  averager  via  the  product 
of  two  orthogonal  1-D  uniform  averagers: 


The  associated  transfer  function  is  the  orthogonal  product  of  the  individual  1-D  trans¬ 
fer  functions.  Note  that  the  1-D  kernels  can  be  different,  such  as  a  3-pixel  uniform 
averager  along  the  //-direction  and  a  2-pixel  uniform  averager  along  m: 


+  J2 

+  H 

+  1 

+1 

+1 

+  12 

+  12 

+  12 

Lowpass-Filtered  Images 


Examples  of  lowpass  filtering:  (a)  f  [n,  to];  (b)  after  uniform  averaging  over  a  3x3 
neighborhood;  (c)  after  uniform  averaging  over  a  5  x  5  neighborhood.  Note  that  the 
“fine  structure  ”  ( such  as  it  is)  becomes  less  visible  as  the  neighborhood  size  increases. 


382 


CHAPTER  16  LOCAL  OPERATIONS 


16.2.5  Frequency  Response  of  1-D  Averagers 


The  impulse  response  of  the  linear  operator  that  averages  uniformly  is  a  unit-area 
rectangle: 


h  [x 


}-RECT 

\b\ 


'X' 

.b. 


where  b  is  the  width  of  the  averaging  region.  The  corresponding  continuous  transfer 
function  is: 


H  [£]  =  SINC 


In  the  discrete  case,  the  rectangular  impulse  response  is  sampled  and  the  width  b  is 
measured  in  units  of  Ax.  If  is  even,  the  amplitudes  of  the  endpoint  samples  are 
|.  We  consider  the  cases  where  b  =  2  •  Ax,  3  •  Ax,  and  4  •  Ax.  The  discrete  impulse 
responses  of  uniform  averagers  that  are  two  and  three  pixels  wide  have  three  nonzero 
samples: 


b  =  2  •  Ax  :  h‘2  [n] 
b  =  3  •  Ax  :  hz  [n] 


^  (Sd  [n  4  1]  +  2  •  8d  [n]  +  8d  [n  -  1]) 
^  (Sd  [n  +  1]  +  Sd  [n]  +  Sd  [n  -  1]) 


The  linearity  of  the  DFT  ensures  that  the  corresponding  transfer  function  may  be 
constructed  by  summing  transfer  functions  for  the  identity  operator  and  for  transla¬ 
tions  by  one  sample  each  to  the  left  and  right.  The  resulting  transfer  functions  may 
be  viewed  as  discrete  approximations  to  the  continuous  SINC  functions: 


H2  [k]  = 


_  _  —  2-7rifc'A£ 


4C 


1  1 

2  +  46 


-\-2nik-At; . 


N  ,  N 

- <k< - 1 

2  “  ~  2 


-  (1  +  cos  [2vr  (k  ■  A£)])  =  SINCd 

A 


k-  * 
2 


Hz  [k]  =  ~e 


_  _  —  2mk-A£ 


1 

3 


2~\-2nik- At; 


=  -  (1  +  2  cos  [2vr  (k  ■  A£)])  =  SINCd 

O 


k ’  3 


Note  that  both  H>  [k]  and  H3  [k]  have  unit  amplitude  at  the  origin  because  of  the 
discrete  central  ordinate  theorem.  The  “zero-crossings”  of  the  two-pixel  averager 
would  be  located  at  £  =  ±|  cycle  per  sample,  i.e. ,  at  the  positive  and  negative  Nyquist 
frequencies,  but  the  index  convention  admits  only  the  negative  index,  k  =  —  y.  This 
sinusoid  oscillates  with  a  period  of  two  samples  and  is  thus  averaged  to  zero  by 
the  two-pixel  averager.  The  three-pixel  averager  should  “block”  any  sinusoid  with  a 
period  3  •  Ax,  which  can  certainly  be  constructed  because  the  Nyquist  condition  is 
satisfied.  The  zero  crossings  of  the  transfer  function  “should”  be  located  at  £  =  ±| 
cycle  per  sample,  which  “would”  occur  at  noninteger  indices  k  =  ±y.  However, 
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because  ±y  is  not  an  integer  when  N  is  even,  the  zero  crossings  of  the  spectrum 
occur  between  samples  in  the  discrete  spectrum.  In  other  words,  the  discrete  transfer 
function  of  the  three-pixel  averager  has  no  zeros  and  thus  must  “pass”  all  sampled 
and  unaliased  sinusoidal  components  (albeit  with  attenuation).  This  seems  to  be  a 
paradox  -  a  sinusoid  with  a  period  of  three  pixels  can  be  sampled  without  aliasing  but 
this  function  is  not  “blocked”  by  a  three-pixel  averager,  whereas  the  two-pixel  sinusoid 
is  blocked  by  a  two-pixel  averager.  The  reason  is  because  there  are  a  noninteger 
number  of  periods  of  length  3  •  Ax  in  an  array  where  N  is  even.  Thus  there  will 
be  “leakage”  in  the  spectrum  of  the  sampled  function.  The  resulting  “spurious” 
frequency  components  pass  reach  the  output.  As  a  final  observation,  note  also  that 
the  discrete  transfer  functions  approach  the  edges  of  the  array  “smoothly”  (without 
“cusps”)  in  both  cases,  as  shown  in  the  figure. 


(a)  (b) 


Comparison  of  2-  and  3-pixel  uniform  averagers  for  N  =  64:  (a)  central  region  of 
the  impulse  response  h2  [n]  =  \RECT  [|1;  (b)  discrete  transfer  function  H2  [ k ] 
compared  to  the  continuous  analogue  H2[f\  =  SINC  [2£]  out  to  the  Nyquist 
frequency  £  =  —  |  ;  (c)  h3  [n]  =  | RECT  [|] ,  which  has  the  same  support  as 

h2  \n\;  (d)  H3  [A;]  compared  to  H3  [£]  =  SINC  [3f\,  showing  the  “smooth”  transition 

at  the  edge  of  the  array. 


The  discrete  impulse  response  of  the  four-pixel  averager  has  five  nonzero  pixels: 
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6  =  4-  Ax  :  hi  [n]  = 

The  linearity  of  the  DFT  ensures  that  the  corresponding  transfer  function  may  be 
constructed  by  summing  the  transfer  function  of  the  three-pixel  averager  scaled  by 
with  the  transfer  functions  for  translation  by  two  samples  each  to  the  left  and  right: 


^  —  27rifc-2A£  I  -2'jrik-A^  i  i  i  p+2Trik- 

“4^2  +  + 

=  -  (1  +  2  cos  [2n  ( k  ■  A£)]  +  cos  [2n  (. k  ■  2A£)]) 

which  also  may  be  thought  of  as  a  discrete  approximation  of  a  SINC  function: 
SINCd  This  discrete  transfer  function  has  zeros  located  at  £  =  ±|cycle 

per  sample,  which  correspond  to  k  =  ±y--  Therefore  the  four- pixel  averager  “blocks” 
any  sampled  sinusoid  with  period  4  •  Ax  from  reaching  the  output.  Again  the  trans¬ 
fer  function  has  “smooth”  transitions  of  amplitude  at  the  edges  of  the  array,  thus 
preventing  “cusps”  in  the  periodic  spectrum,  as  shown  in  the  figure. 


Hi  [k] 


+27riA;-2A£  j 

2  ) 


(a) 


(b) 


n 


k 


Four-pixel  averager  for  N  =  64:  (a)  central  region  of  impulse  response 
hi  [n]  =  \RECT  [|];  (b)  Discrete  transfer  function  Hi  [ k }  compared  to  the 
continuous  transfer  function  SINC  [4£]  ,  showing  the  smooth  decay  of  the  discrete 

case  at  the  edges  of  the  array. 


A\O0 
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The  general  expression  for  the  discrete  SINC  function  in  the  frequency  domain 
suggested  by  these  results  for  —  y  <  k  <  y  —  1  is: 


SINCd 


r 


< 


W 


£  1  +  2^  cos  [2vr  (k-£-  AO] 

V  f=1 

f-1 

cos  [27t  {k  ■  f  ■  A£)]  +  2^^  cos  [27t  (k  ■  £  ■  A£)] 


i=i 


if  w  is  odd 


if  w  is  even 


16.2.6  2-D  Averagers 


Effect  of  Lowpass  Filtering  on  the  Histogram 


Because  an  averaging  kernel  reduces  pixel-to-pixel  variations  in  gray  level  (and  hence 
the  variance  of  additive  random  noise  in  the  image),  we  would  expect  that  clusters 
of  pixels  in  the  histogram  of  an  averaged  image  to  be  taller  and  thinner  than  in  the 
original  image.  It  should  be  easier  to  segment  objects  based  on  average  gray  level  from 
the  histogram  of  an  averaged  image.  To  illustrate,  we  reconsider  the  example  of  the 
house-tree  image.  The  image  in  blue  light  and  its  histogram  before  and  after  averaging 
with  a  3  x  3  kernel  are  shown  below:  Note  that  there  are  four  fairly  distinct  clusters 
in  the  histogram  of  the  averaged  image,  corresponding  to  the  house,  grass+tree,  sky, 
and  clouds+door  (from  dark  to  bright).  The  small  clusters  at  the  ends  are  more 
difficult  to  distinguish  on  the  original  histogram. 
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Red 


Red 


Green 


After  3x3  Blur  Red  Green  Blue 


- 

.Li. 

Green 


Effect  of  blurring  on  the  histogram:  the  64  x  64  color  image,  the  histograms  of  the  3 
bands,  and  the  3  2-D  histograms  are  shown  at  top;  the  same  images  and  histograms 
after  blurring  with  a  3x3  kernel  are  at  the  bottom,  showing  the  concentration  of 
histogram  clusters  resulting  from  image  blur. 


Note  that  the  noise  visible  in  uniform  areas  of  the  images  (e.g.,  the  sky  in  the 
blue  image)  has  been  noticeably  reduced  by  the  averaging,  and  thus  the  widths  of  the 
histogram  clusters  have  decreased.  The  downside  of  this  process  is  that  pixels  on  the 
boundaries  of  objects  now  exhibit  blends  of  the  colors  of  both  bounding  objects,  and 
thus  will  not  be  as  easy  to  segment. 


16.2.7  Differencing  Kernels  —  Highpass  Filters 

A  kernel  with  both  positive  and  negative  terms  computes  differences  of  neighboring 
pixels.  From  the  previous  discussion,  it  is  probably  apparent  that  the  converse  of  the 
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statement  about  local  averaging  also  is  true. 


Local  Differencing  increases  the  variance  of  pixel  gray  values 

The  difference  of  adjacent  pixels  with  identical  gray  levels  will  cancel  out,  while 
differences  between  adjacent  pixels  will  be  amplified.  Since  high-frequency  sinusoids 
vary  over  shorter  distances,  differencing  operators  will  enhance  them  and  attenuate 
slowly  varying  (i.e.,  lower- frequency)  terms. 


Differencing  operators  are  highpass  filters 

because  a  differencing  operator  twill  “block”  low-frequency  sinusoids  and  “pass”  those 
with  high  frequencies. 

Subtraction  of  adjacent  pixels  can  result  in  negative  gray  values;  this  is  the  spatial 
analogy  of  temporal  differencing  for  change  detection.  The  gray  values  of  the  output 
image  must  be  biased  “up”  for  display  by  adding  some  constant  gray  level  to  all  image 
pixels,  e.g.,  if  g  min  <  0,  then  the  negative  gray  values  may  be  displayed  by  adding  the 
level  r/min  to  all  pixel  gray  values. 

The  discrete  analogue  of  differentiation  may  be  derived  from  the  definition  of  the 
continuous  derivative: 

df_  =  Um  /  f[x  +  r\-f  [x] 
dx  t->o  y  r 

In  the  discrete  case,  the  smallest  nonzero  value  of  r  is  the  sampling  interval  Ax,  and 
thus  the  discrete  derivative  is: 

(/  [(n  +  1)  '  Ax]  —  f  [n  •  Axl)  =  — -  f  [n  ■  Ax]  *  ( S  [n  +  1]  —  6  [n 

Ax  Ax 

In  words,  the  discrete  derivative  is  the  scaled  difference  of  the  value  at  the  sample 
indexed  by  n  +  1  and  by  n.  By  setting  Ax  =  1  sample ,  the  leading  scale  factor  may 
be  ignored.  The  1-D  derivative  operator  may  be  implemented  by  discrete  convolution 
with  a  1-D  kernel  that  has  two  nonzero  elements;  we  will  write  it  with  three  elements 
to  clearly  denote  the  sample  indexed  by  n  =  0. 


/  N  *  [n  +  1]  —  5  [n]) 


+1 


/  [n]  *  d  [n 


where  d  [n] 


■  1 


-1 


0 


is  the  discrete  impulse  response  of  differentiation,  which 


is  perhaps  better  called  a  differencing  operator.  Note  that  the  impulse  response  may 
be  decomposed  into  its  even  and  odd  parts. 
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The  even  part  is  a  weighted  difference  of  the  identity  operator  and  the  three-pixel 
averager,  while  the  odd  part  computes  differences  of  pixels  separated  by  two  sample 
intervals. 

The  corresponding  2-D  derivative  kernel  in  the  ^-direction  is: 
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The  resulting  image  is  equivalent  to  the  difference  between  an  image  translated  one 
pixel  to  the  left  and  an  unshifted  image,  i.e., 


d_ 

dx 


f[x,y] 


lim  f\x  + Ax,  y\  -  f[x,y } 

Ax-^O  Ax 

d 

=>  w-/[z,  y]  =  f[(n  +  1)  •  Ax,  m  ■  Ay\  -  f[n  ■  Ax,  m  ■  Ay] 
ox 

=>  dx*  f  [n,  to]  =  f[n  +  1,  to]  -  f[n,  to] 


because  the  minimum  nonzero  value  of  the  translation  Ax  =  1  sample.  The  corre¬ 
sponding  discrete  partial  derivative  in  the  ^-direction  is: 


dy  *  f  [n,  to]  =  /  [n,  to  +  1]  —  /  [n,  m 


The  partial  derivative  in  the  y-dircction  is  the  difference  of  a  replica  translated 
“up”  and  the  original: 
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-1 
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0 

This  definition  of  the  derivative  effectively  “locates”  the  edge  of  an  object  at  the  pixel 
immediately  to  the  right  or  above  the  “crack”  between  pixels  that  is  the  actual  edge. 


16.2  CONVOLUTION 


389 


Original  Image  of  “Edge” 


After  convolution  with 
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0 

2-D  Edge  image  and  the  first  derivative  obtained  by  convolving  with  the  3x3 
first- derivative  kernel  in  the  x-direction,  showing  that  the  detected  edge  in  the 
derivative  image  is  located  at  the  pixel  to  the  right  of  the  edge  transition. 


“Symmetric”  (actually  antisymmetric  or  odd)  versions  of  the  derivative  operators 
are  sometimes  used  that  evaluate  the  difference  across  two  pixels: 
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These  operators  locate  the  edge  of  an  object  between  two  pixels  symmetrically,  but 
produce  an  output  that  is  two  pixels  wide  centered  about  the  location  of  the  edge: 


Original  Image  of  “Edge” 


0 

0 

0 

After  convolution  with 

-1 

0 

+1 

0 

0 

0 

2-D  Edge  image  and  the  first  derivative  obtained  by  convolving  with  the  3x3 
“symmetric”  first- derivative  kernel  in  the  x-direction,  showing  that  the  edge  is  a 
two-pixel  band  symmetrically  placed  about  the  transition. 
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Higher-Order  Derivatives 


The  kernels  for  higher-order  derivatives  are  easily  computed  because  convolution  is 
associative.  The  convolution  kernel  for  the  1-D  second  derivative  is  obtained  by 
aut-oconvolving  the  kernel  for  the  1-D  first  derivative: 

d2  d  (  d  \ 

=  —  (  lim  ^X  +  Ax,y^  —  feM 
dx  yAx-^o  Aa; 

lim  (  lim  (  ^X  +  2 1 AT  ?/]  ~  I}?  +  AT  M A  _  lim  f  f\x  +  Ax’V\  ~  SWv\ 
Ax^o  yAx— »o  y  A.x  J  Ax— >0  y  Aa; 

/  f[x  +  2  •  Ax,  y]  -2  f[x  +  Ax ,  j/]_+  /  [x,  y\ 

a™o  V  Aa; 

==>  d2  *  f[n,  to]  =  f[n  +  2,  m]  —  2/[n  +  1,  m]  —  /  [n,  to] 


which  may  be  evaluated  by  convolution  with  a  five-element  kernel,  which  is  displayed 
in  a  5  x  5  window  to  identify  the  center  pixel 
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This  usually  is  “centered”  in  a  3  x  3  array  by  translating  the  kernel  one  pixel  to  the 
right  and  lopping  off  the  zeros: 
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Except  for  the  one-pixel  translation,  this  operation  generates  the  same  image  pro¬ 
duced  by  the  cascade  of  two  first-derivative  operators.  The  corresponding  2-D  second 
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partial  derivative  kernels  in  the  y-direction  are: 
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The  derivation  may  be  extended  to  derivatives  of  still  higher  order  by  convolving 
kernels  to  obtain  the  kernels  for  the  1-D  third  and  fourth  derivatives.  The  third 
derivative  may  be  displayed  in  a  7  x  7  kernel: 


dl  =  dx*dx*  dx 
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which  also  is  usually  translated  and  truncated:: 

43  = 


The  gray-level  extrema  of  the  image  produced  by  a  differencing  operator  indicate 
pixels  in  regions  of  rapid  variation,  e.g.,  at  the  edges  of  distinct  objects.  The  visibility 
of  these  pixels  can  be  further  enhanced  by  a  subsequent  contrast  enhancement  or 
thresholding  operation. 
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Because  they  compute  weighted  differences  in  pixel  gray  value,  differencing  oper¬ 
ators  also  will  enhance  the  visibility  of  noise  in  an  image.  Consider  the  1-D  example 
where  the  input  image  /  [xr]  is  a  3-bar  chart  with  added  noise,  so  that  the  signal-to- 
noise  ratio  (SNR)  of  4.  The  convolution  of  the  input  /  [re]  with  an  averaging  kernel 

hi  M  =  | 


shown: 


+1 


+1 


+1 


and  with  differencing  kernel  h2  [re  = 


-1 


+3 


-1 


are 


Effect  of  averaging  and  sharpening  on  an  image  with  noise:  (a)  f  [n]  +  noise ;  (b) 
after  averaging  over  a  3-pixel  neighborhood  with  hi  [n]  =  |  +1  +1  +1  ;  (c)  after 

applying  a  sharpening  operator  h2  [n]  =  —  1  +3  —1 

Note  that  the  noise  is  diminished  by  the  averaging  kernel  and  enhanced  by  the 
differencing  kernel. 


16.2.8  Frequency  Responses  of  2-D  Derivative  Operators: 
First  Derivatives 

The  2-D  x-derivative  operator  denoted  by  dx: 
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has  the  associated  discrete  transfer  function  obtained  via  the  discrete  Fourier  trans¬ 
form: 
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The  differencing  operator  in  the  ^-direction  and  its  associated  transfer  function  are 
obtained  by  rotating  the  expressions  just  derived  by  +|  radians.  The  kernel  is: 
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+1 

0 

The  corresponding  transfer  function  is  the  rotated  version  of  JF2  {0X}\ 

{9,}  =  H(St)  [M]  =  1  W  •  (-1  +  e+M*) 

We  can  also  define  2-D  derivatives  along  angles.  We  can  also  define  differences  along 
the  diagonal  directions: 


d(e=+f)  ~ 


d(e=+¥)  ~ 
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The  angle  in  radians  has  been  substituted  for  the  subscript.  Again,,  the  continuous 
distance  between  the  elements  has  been  scaled  by  y/2. 


1-D  Antisymmetric  Differentiation  Kernel 


We  can  also  construct  a  discrete  differentiator  with  odd  symmetry  by  placing  the 
components  of  the  discrete  “doublet”  at  samples  n  =  ±1: 


(dx)2  = 

This  impulse  response  is  proportional  to  the  odd  part  of  the  original  1-D  differentiator. 
The  corresponding  transfer  function  is  again  easy  to  evaluate  via  the  appropriate 
combination  of  translation  operators.  Because  ( dx)2  is  odd,  the  real  part  of  the 
discrete  transfer  function  is  zero,  as  shown  in  the  figure: 


+1 
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Note  that  this  transfer  function  evaluated  at  the  Nyquist  frequency  is: 


H 


k 


N 

~2 


2  |  sin  [ — 7r]  |  =  0 


which  means  that  this  differentiator  also  “blocks”  the  Nyquist  frequency.  This  may 
be  seen  by  convolving  (dx)2  with  a  sinusoid  function  that  oscillates  with  a  period  of 
two  samples.  Adjacent  positive  extrema  are  multiplied  by  ±1  in  the  kernel  and  thus 
cancel.  Also  note  that  the  transfer  function  amplifies  lower  frequencies  more  and 
larger  frequencies  less  than  the  continuous  transfer  function. 
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(a)  h[n] 


(b)  Real  Part  of  Transfer  Function  (c)  Imaginary  Part  of  Transfer  Function 


(d)  Magnitude  (e)  Phase 


-32  -24  -16  -8  0  8  16  24  -32  -24  -16  -8  0  8  16  24 


1-D  discrete  antisymmetric  derivative  operator  and  its  transfer  function:  (a)  1-D 
impulse  response  h  [n];  Because  h  [n]  is  odd,  H  [k]  is  imaginary  and  odd.  The 
samples  of  the  discrete  transfer  transfer  function  are  shown  as  (b)  real  part,  (c) 
imaginary  part,  (d)  magnitude,  and  (e)  phase,  along  with  the  corresponding 
continuous  transfer  function  H  [£]  =  i2iif.  Note  that  the  magnitude  of  the  discrete 
transfer  function  is  attenuated  near  the  Nyquist  frequency  and  that  its  phase  is 
identical  to  that  of  the  transfer  function  of  the  continuous  derivative. 
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1-D  Second  Derivative 


The  impulse  response  and  transfer  function  of  the  continuous  second  derivative  are 
easily  obtained  from  the  derivative  theorem: 


h  [a;]  =  6"  [a;] 

H  [£]  =  =  -4tt2£2 

Again,  different  forms  of  the  discrete  second  derivative  may  be  defined.  One  form 
is  obtained  by  differentiating  the  first  derivative  operator  via  discrete  convolution  of 
two  replicas  of  dx.  The  result  is  a  five-pixel  kernel  including  two  null  weights: 


dr  *  dT  = 
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The  corresponding  discrete  transfer  function  is  obtained  by  substituting  results  from 
the  translation  operator: 


H  [k]  =  e+2niN  -  2  e 
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_  ^-\-2iri  N  fe+27TiN 


=  2  e+2™ - 
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The  leading  linear  phase  factor  usually  is  discarded  to  produce  the  real-valued  and 
symmetric  discrete  transfer  function: 


-  1 


Deletion  of  the  linear  phase  is  the  same  as  translation  of  the  original  discrete  second 
derivative  kernel  by  one  pixel  to  the  right.  The  discrete  impulse  response  for  this 
symmetric  discrete  kernel  is  also  real  valued  and  symmetric: 
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h  \n]  =  dl  = 
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+1 
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+1 


=  8d  [n  +  1]  -  2  5d  [n]  +  5d  [n  -  1] 


and  the  magnitude  and  phase  of  the  transfer  function  are: 
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k 
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H  [ k ]  |  =  2  ^1  —  cos  2n 

—7 r  for  k  7^  0 
0  /or  k  =  0 


*{#[*]}  = 


=  vr(-l  + 


as  shown  in  the  figure.  The  amplitude  of  the  discrete  transfer  function  at  the  Nyquist 
frequency  is: 
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2  •  (cos  [ — tt]  —  1)  =  — 4 


while  that  of  the  continuous  transfer  function  is  — 47t2  (— |)2  =  —n2  =  —9.87,  so  the 
discrete  second  derivative  does  not  amplify  the  amplitude  at  the  Nyquist  frequency 
as  much  as  the  continuous  second  derivative.  The  transfer  function  is  a  discrete 
approximation  of  the  parabola  and  again  approaches  the  edges  of  the  array  to  ensure 
smooth  periodicity. 

Higher-order  discrete  derivatives  may  be  derived  by  repeated  discrete  convolution 
of  0X ,  after  discarding  any  linear-phase  factors. 
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(b) 


3 


2  4 


i4 


0 


-i4 


-2  + 


■3  H - i - i - i - i - i - i - b 

-32  -24  -16  -8  0  8  16  24 

n 


k 


1-D  Discrete  second  derivative:  (a)  Impulse  response  d2;  (b)  comparison  of  discrete 

and  continuous  transfer  functions. 


16.2.9  Laplacian  Operator 

The  Laplacian  operator  for  continuous  functions  was  introduced  in  the  discussion  of 
electromagnetism.  It  is  the  sum  of  orthogonal  second  partial  derivatives: 

v2/[*'yi=(4+|?)/M1 
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The  associated  transfer  function  is  the  negative  quadratic  that  evaluates  to  0  at  DC, 
which  again  demonstrates  that  constant  terms  are  blocked  by  differentiation: 

H  [£,  rj]  =  -4vr2  (£2  +  rj2) 

The  discrete  Laplacian  operator  is  the  sum  of  the  orthogonal  2-D  second-derivative 
kernels: 


dl 


dl  =  V2  = 


0 

0 

0 

+1 

-2 

+1 

0 

0 

0 

0 

+1 

0 

0 

-2 

0 

0 

+1 

0 

0 

+1 

0 

+1 

-4 

+1 

0 

+1 

0 

The  discrete  transfer  function  of  this  “standard”  discrete  Laplacian  kernel  is: 
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The  amplitude  at  the  origin  is  H  [k  =  0,  l  =  0]  =0  and  decays  in  the  horizontal  or 
vertical  directions  to  —6  at  the  “edge”  of  the  discrete  array  and  to  —8  at  its  corners. 


Rotated  Laplacian 

The  sum  of  the  second-derivative  kernels  along  the  diagonals  creates  a  rotated  version 
of  the  Laplacian,  which  is  “nearly”  equivalent  to  rotating  the  operator  V2  by  0  =  - f  | 
radians: 


Derivation  of  the  transfer  function  is  left  to  the  student;  its  magnitude  is  zero  at  the 
origin  and  its  maximum  negative  values  are  located  at  the  horizontal  and  vertical 
edges,  but  the  transfer  function  zero  at  the  corners. 
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(a)  (b) 
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The  real  and  symmetric  transfer  functions  of  Laplacian  operators:  (a)  “normal” 
Laplacian  from  Eq.(20.50);  (b)  rotated  Laplacian  from  Eq.(20.51),  showing  that  the 
amplitude  rises  back  to  0  at  the  corners  of  the  array. 


A  2-D  example  is  shown  in  the  figure,  where  the  input  function  is  nonnegative  and  the 
bipolar  output  g  [n,m\  is  displayed  as  amplitude  and  as  magnitude  | g  [n,  m]\,  which 
shows  that  the  response  is  largest  at  the  edges  and  corners. 
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o 


Action  of  the  2-D  discrete  Laplacian.  The  input  amplitudes  are  nonnegative  in  the 
interval  0  <  /  <  1,  and  the  output  amplitude  is  bipolar  in  the  interval  —  2  <  g  <  +2 
in  this  example.  As  shown  in  the  magnitude  image,  the  extrema  of  output  amplitude 
occurs  at  the  edges  and  corners  of  the  input. 
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Isotropic  Laplacian: 


A  commonly  used  “isotropic”  Laplacian  is  obtained  by  summing  the  original  and 
rotated  Laplacian  kernels: 
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The  linearity  of  the  DFT  ensures  that  the  transfer  function  of  the  isotropic  Laplacian 
is  the  real-valued  and  symmetric  sum  of  the  “normal”  and  rotated  Laplacians. 


Generalized  Laplacian 


The  isotropic  Laplacian  just  considered  may  be  written  as  the  difference  of  a  3  x  3 
average  and  a  scaled  discrete  delta  function: 
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which  suggests  that  the  Laplacian  operator  may  be  generalized  to  include  any  operator 
that  computes  the  difference  of  a  scaled  original  image  and  replicas  that  were  blurred 
by  some  averaging  kernel.  For  example,  the  impulse  response  of  the  averager  may  be 
the  2-D  circularly  symmetric  continuous  Gaussian  impulse  response: 

h  [ x ,  y\  =  A  exp 

where  the  decay  parameter  b  determines  the  rate  at  which  the  values  of  the  kernel 
decrease  away  from  the  center  and  the  amplitude  parameter  A  often  is  selected  to 
normalize  the  sum  of  the  elements  of  the  kernel  to  unity,  thus  ensuring  that  the 
process  computes  a  weighted  average.  A  normalized  discrete  approximation  of  the 
Gaussian  kernel  with  b  =  2  •  Ax  is: 


-7T 


X 


b2 


402 


CHAPTER  16  LOCAL  OPERATIONS 


hi  [n,  m  = 


2047 


1 

21 


1 

11 

23 

11 

1 

11 

111 

244 

111 

11 

23 

244 

535 

244 

23 

11 

111 

244 

111 

11 

1 

11 

23 

11 

1 

0 

0 

1 

0 

0 

0 

1 

2 

1 

0 

1 

2 

5 

2 

1 

0 

1 

2 

1 

0 

0 

0 

1 

0 

0 

The  corresponding  generalized  Laplacian  operator  is  this  difference  of  this  quantized 
Gaussian  and  the  5x5  identity  kernel: 
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if  a  = 


where  a  is  the  weighting  of  the  identity  image.  Note  that  the  sum  of  the  elements  of 
this  generalized  Laplacian  kernel  is  zero  because  of  the  normalization  of  the  Gaussian 
kernel,  and  so  this  operator  applied  to  an  input  with  uniform  gray  value  will  yield  a 
null  image. 
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16.2.10  Discrete  “Sharpening”  Operators 
1-D  Case 

A  “sharpening”  operator  passes  all  sinusoidal  components  with  no  change  in  phase 
while  amplifying  those  with  large  spatial  frequencies.  This  action  will  tend  to  compen¬ 
sate  for  the  effect  of  lowpass  filtering.  One  example  of  a  1-D  continuous  sharpener  is 
constructed  from  the  second  derivative;  the  amplitude  of  its  transfer  function  is  unity 
at  the  origin  and  rises  as  £2  for  larger  spatial  frequencies: 

H  [£]  =  1  +  4tt2£2 

The  corresponding  continuous  1-D  impulse  response  is  the  difference  of  the  identity 
and  second-derivative  kernels 


h  [a;]  =  S  [re]  —  5"  [x] 

A  discrete  version  of  the  impulse  response  may  be  generated  by  substituting  the 
discrete  Dirac  delta  function  and  the  “centered”  discrete  second  derivative  operator: 


Sd-%  = 

0 

+1 

0 

— 

+1 

-2 

+1 

-1 

+3 

-1 

= 

0 

+4 

0 

— 

+1 

+1 

+1 

=  4  •  5d  [n]  -  RECT 


The  transfer  function  of  the  discrete  sharpener  is: 


H  [k]  =4-1  [ k ]  —  |^1  +  2  cos 
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The  amplitudes  of  the  transfer  function  at  DC  and  at  the  Nyquist  frequency  are: 


H  [k  =  0] 


+1 

+5 


In  words,  the  “second-derivative  sharpener”  amplifies  the  amplitude  of  the  sinusoidal 
component  that  oscillates  at  the  Nyquist  frequency  by  a  factor  of  5. 

The  action  of  this  sharpening  operator  on  a  “blurry”  edge  is  shown  in  the  figure. 
The  slope  of  the  edge  is  “steeper”  after  sharpening,  but  the  edge  also  “overshoots” 
the  correct  amplitude  at  both  sides. 
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Slope  of 
Blurred  Edge 


n 

Action  of  1-D  2nd- derivative  sharpening  operator  on  a  “blurry”  edge.  The  angle  of 
the  slope  of  the  sharpened  edge  is  “steeper”,  but  the  amplitude  “overshoots  ”  the 
correct  value  on  both  sides  of  the  edge.  The  output  is  not  the  ideal  sharp  edge,  so 
this  operator  only  approximates  the  ideal  inverse  filter. 

This  interpretation  may  be  extended  to  derive  other  1-D  sharpeners  by  computing 
the  difference  between  a  scaled  replica  of  the  “original”  (blurred)  input  image  and  an 
image  obtained  by  passing  through  a  different  lowpass  filter. 

2-D  Sharpening  Operators 

We  can  generalize  the  1-D  discussion  to  produce  a  2-D  sharpener  based  on  the  Lapla- 
cian.  The  discrete  version  often  is  used  to  sharpen  digital  images  that  have  been 
blurred  by  unknown  lowpass  filters.  The  process  is  ad  hoc;  it  is  not  “tuned”  to  the 
details  of  the  blurring  process  and  so  is  not  an  “inverse”  filter.  It  cannot  generally 
reconstruct  the  original  sharp  image,  but  it  “steepens”  the  slope  of  pixel-to-pixel 
changes  in  gray  level,  thus  making  the  edges  appear  “sharper.”  The  impulse  response 
of  the  2-D  continuous  Laplacian  sharpener  is: 

/  [x,  y,0}  =  f  [x,  y,z\  —  a  ■  V2/  [x,  y,  z\ 

where  a  is  a  real-valued  free  parameter  that  allows  the  sharpener  to  be  “tuned”  to 
the  amount  of  blur.  Obviously,  the  corresponding  discrete  solution  is: 

9  [n,  m]=f  [ n ,  to]  —  a  ■  V2J  [n,  m\ 

=  (8d  [: n ,  m]  -  a  ■  V2)  *  /  [n,  m] 

where  V2  [n,  to]  is  a  Laplacian  kernel  that  may  be  selected  from  the  variants  already 
considered.  A  single  discrete  sharpening  kernel  h  [n,  to]  may  be  constructed  from  the 
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simplest  form  for  the  Laplacian: 


The  parameter  a  may  be  increased  to  enhance  the  sharpening  by  steepening  the  slope 
of  the  edge  profile  and  also  increasing  the  “overshoot.”  Selection  of  a  =  +1  produces 
a  commonly  used  sharpening  kernel: 


hi  [n,  m;a  =  +1]  = 


The  weights  in  the  kernel  sum  to  unity,  which  means  that  the  average  gray  value  of 
the  image  is  preserved.  In  words,  this  process  amplifies  differences  in  gray  level  of 
adjacent  pixels  while  preserving  the  mean  gray  value. 

The  corresponding  discrete  transfer  function  for  the  parameter  a  is: 
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Hi  [k,  l\  a]  =  (1  +  4a)  —  2a  ^cos 
In  the  case  a  =  +1,  the  resulting  transfer  function  is: 
Hi  [k,  £;  a  =  1]  =  5  —  2  f  cos 
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which  has  its  maximum  amplitude  of  (H\  )max  =  9  at  the  corners  of  the  array. 
A  sharpening  operator  also  may  be  derived  from  the  isotropic  Laplacian: 


h,2  [ n ,  m;  a  = 


Again,  the  sum  of  the  elements  in  the  kernel  is  unity,  ensuring  that  the  average  gray 
value  of  the  image  is  preserved  by  the  action  of  the  sharpener.  If  the  weighting  factor 
is  again  selected  to  be  unity,  the  kernel  is  the  difference  of  a  scaled  original  and  a 
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3x3  blurred  copy: 
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This  type  of  process  has  been  called  unsharp  masking  by  photographers.  A  sandwich 
of  transparencies  of  the  original  image  and  a  blurred  negative  produces  a  sharpened 
image  of  the  original.  This  difference  of  the  blurred  image  and  the  original  is  easily 
implemented  in  a  digital  system  as  a  single  convolution. 


An  example  of  2-D  sharpening  is  shown  in  the  figure.  Note  the  “overshoots”  at 
the  edges  in  the  sharpened  image.  The  factor  of  +9  ensures  that  the  dynamic  range 
of  the  sharpened  image  can  be  as  large  as  from  +9  to  —8  times  the  maximum  gray 
value,  or  —2040  <  /  <  +2295  for  an  8-bit  image.  This  would  only  happen  for  an 
isolated  bright  pixel  at  the  maximum  surrounded  by  a  neighborhood  of  black  pixels, 
and  vice  versa.  In  actual  use,  the  range  of  values  is  considerably  smaller.  The  image 
gray  values  either  have  to  be  biased  up  and  rescaled  or  “clipped”  at  the  maximum 
and  minimum,  as  was  done  here. 
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g  [77, 777]  =  f  [77, 777]  *  h  [77, 777] 


Action  of  the  2-D  sharpening  operator  based  on  the  Laplacian.  The  original  image 
f  [n,  to]  has  been  blurred  by  a  3x3  uniform  averager  to  produce  g  [n,  to]  .  The  action 
of  the  3x3  Laplacian  sharpener  on  g  [n,  to]  produced  the  bipolar  image  f  [n,  to]  , 
which  was  clipped  at  the  original  dynamic  range.  The  “ overshoots  ”  at  the  edges  an 

gives  the  impression  of  a  sharper  image. 

16.2.11  2-D  Gradient 

The  gradient  of  a  2-D  continuous  function  /  [a;,  y\  also  was  defined  in  the  discussion 
of  electromagnetism.  It  constructs  a  2-D  vector  at  each  location  in  a  scalar  (gray¬ 
scale)  image.  The  x-and  ^-components  of  the  vector  at  each  location  are  the  x-  and 
^-derivatives  of  the  image: 


g  [x,y\  =  V/  [x,y\  = 

The  image  /  [n,  to]  is  a  scalar  function  which  assigns  a  numerical  gray  value  /  to 
each  coordinate  [n, to].  The  gray  value  /  is  analogous  to  terrain  “elevation”  in  a 


of  0J_ 

dx ’  dy 
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map.  This  process  calculates  a  vector  at  each  coordinate  [x,  y]  of  the  scalar  image 
whose  Cartesian  components  are  ^  and  Note  that  the  2-D  vector  V/  may  be 
represented  in  polar  form  as  magnitude  |V/|  and  direction  $  {V/}: 


dfV  ,  (df 


I V/M  =  */(£:)  +  (^ 


dx 


<I>  {V/  [n,  m\}  =  tan  1 


(91) 

\dx  ) 


The  2-D  vector  at  each  location  has  the  values: 


dx*  f  [n,  m 
dy*f  [n,  m] 

This  vector  points  “uphill”  in  the  direction  of  the  maximum  “slope”  in  gray  level. 
The  magnitude  |V/|  is  the  “slope”  of  the  3-D  surface  /  at  pixel  [ n,m ].  The  azimuth 
angle  (often  called  the  phase  by  analogy  with  complex  numbers)  of  the  gradient 
<I>  {V/  [n,  to]}  is  the  compass  direction  toward  which  the  slope  points  “uphill.” 


g  [n,  to]  =  V/  [n,  m 


The  discrete  version  of  the  gradient  magnitude  also  is  a  useful  operator  in  digital 
image  processing,  as  it  will  take  on  extreme  values  at  edges  between  objects.  The 
magnitude  of  the  gradient  often  is  approximated  as  the  sum  of  the  magnitudes  of  the 
components: 


|  V/  [n,  to]  |  =  yj(dx  *  f  \n,  to])2  +  (dy*f  [n,  to])2 
-1 9x*f  [n,  m]\  +  \dy*f  [n,  to]  | 


The  gradient  is  not  a  linear  operator,  and  thus  can  neither  be  evaluated  as  a 
convolution  nor  described  by  a  transfer  function.  The  largest  values  of  the  magnitude 
of  the  gradient  correspond  to  the  pixels  where  the  gray  value  “jumps”  by  the  largest 
amount,  and  thus  the  thresholded  magnitude  of  the  gradient  may  be  used  to  identify 
such  pixels.  In  this  way  the  gradient  may  be  used  as  an  “edge  detection  operator.” 
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Example  of  the  discrete  gradient  operator  V/  [n,m].  The  original  object  is  the 
nonnegative  function  f  [n,m]  shown  in  (a),  which  has  amplitude  in  the  interval 
0  <  /  <  +1.  The  gradient  at  each  pixel  is  the  2-D  vector  with  components  bipolar 
t£,  .  The  two  component  images  are  shown  in  (b)  and  (c).  These  also  may  be 

displayed  as  the  magnitude  J (|£ )2  +  and  the  angle  <f  =  tan^1  .  The 

extrema  of  the  magnitude  are  located  at  comers  and  edges  in  f  [n,m]. 

16.2.12  Pattern  Matching 

It  often  is  useful  to  design  kernels  to  locate  specific  gray-level  patterns,  such  as  edges 
at  particular  orientations,  corners,  isolated  pixels,  particular  shapes,  or  what  have 
you.  Particularly  in  the  early  days  of  digital  image  processing  when  computers  were 
less  capable  than  they  are  today,  the  computational  intensity  of  the  calculation  often 
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was  an  important  issue.  It  was  desirable  to  find  the  least  intensive  method  for  common 
tasks  such  as  pattern  detection,  which  generally  meant  that  the  task  was  performed 
in  the  space  domain  using  a  small  convolution  kernel  rather  than  calculating  a  better 
approximation  to  the  ideal  result  in  the  frequency  domain.  That  said,  the  process  of 
designing  and  applying  a  pattern-matching  kernel  illuminates  some  of  the  concepts 
and  thus  is  worth  some  time  and  effort. 

A  common  technique  for  pattern  matching  convolves  the  input  image  with  a  kernel 
that  is  the  same  size  as  the  “reference”  pattern.  The  process  and  its  limitations  will 
be  illustrated  by  example.  Consider  an  input  image  /  [n,  to]  that  is  composed  of  two 
replicas  of  some  real- valued  nonnegative  pattern  of  gray  values,  p  [n,  to]  ,  centered  at 
coordinates  [ni,mi]  and  [712,7712]  with  respective  amplitudes  A\  and  A2.  The  image 
also  includes  a  bias  b  ■  1  [n,  to]  : 


/  [n,  to]  =  A\  •  p  [n  —  77.1,  to  —  toi]  +  A2  ■  p[n  —  n2l  m  —  m2]  +  6-1  [77,,  to] 

The  appropriate  kernel  of  the  discrete  filter  is: 

777  [77.,  TO]  =  p  [—77.,  —TO] 

which  also  is  real  valued  and  nonnegative  within  its  region  of  support.  The  output 
from  this  matched  filter  autocorrelation  of  the  pattern  centered  at  those  coordinates: 

g  [77,  to]  =  /  [n,  to]  *  777  [77.,  to] 

=  A1-p  [77,  to]  ★p  [77,  to]  \n=num=mi  +  A2  ■  p  [77,  to]  ★p  [77,  to]  |n=n2>m=m2 

+  b  ■  (1  [77.,  m\*  p  [—77.,  —to]) 

=  Ai  •  p  [77,,  to]  ★p  [77,,  to]  \n=num=mi  +  A2  ■  p  [77,,  to]  ★p  [77,,  to]  |n=n2>m=m2 

+  b  ■  p  [77.,  to] 

n,m 

The  last  term  is  the  constant  output  level  from  the  convolution  of  the  bias  with  the 
matched  filter,  which  produces  the  sum  of  the  product  of  the  bias  and  the  weights  at 
each  sample.  The  spatially  varying  autocorrelation  functions  rest  on  a  bias  propor¬ 
tional  to  the  sum  of  the  gray  values  p  in  the  pattern.  If  the  output  bias  is  large,  it 
can  reduce  the  “visibility”  of  small  (but  significant)  variations  in  the  autocorrelation 
in  exactly  the  same  way  as  small  modulations  of  a  nonnegative  sinusoidal  function 
with  a  large  bias  are  difficult  to  see.  It  is  therefore  convenient  to  construct  a  matched 
filter  kernel  whose  weights  sum  to  zero.  It  only  requires  subtraction  of  the  average 
value  from  each  sample  of  the  kernel: 

777  [77,,  TO]  =  p  [-77,,  -TO]  -  Paverage 

==>-  777  [—77.,  —TO]  =  777  [77.,  TO]  =  0 

n,m  n,m 

Thus  ensuring  that  the  constant  bias  vanishes.  This  result  determines  the  strategy 
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for  designing  convolution  kernels  that  produce  outputs  that  have  large  magnitudes  at 
pixels  centered  on  neighborhoods  that-  contain  these  patterns  and  small  magnitudes 
in  neighborhoods  where  the  feature  does  not  exist.  For  example,  consider  an  image 
containing  an  “upper-right  corner”  of  a  brighter  object  on  a  darker  background: 


/  [n,  m  = 


The  task  is  to  design  a  3  x  3  kernel  for  “locating”  this  pattern: 


p  [n,  m\  = 

In  other  words,  we  want  to  construe  an  operator  that  produces  a  large  output  value 
when  it  is  centered  over  this  “upper-right  corner”  pattern.  The  recipe  for  convolution 
tells  us  to  rotate  the  pattern  by  n  radians  about  its  center  to  create  p  [— n,  —m\ : 


p  [— n,  —m  = 


The  average  weight  in  this  3x3  kernel  is  ^  =  72.222,  which  is  subtracted  from  each 
element: 


=  (+22.222) 


The  multiplicative  factor  may  be  ignored  since  it  just  scales  the  output  of  the  convo¬ 
lution  by  this  constant.  Thus  one  realization  of  the  unamplified  3x3  matched  filter 
for  upper-right  corners  is: 


412 


CHAPTER  16  LOCAL  OPERATIONS 


m  [n,  to]  = 


Though  it  is  not  really  an  issue  any  longer  (given  the  advanced  state  of  computing 
technology),  it  was  once  more  convenient  to  restrict  the  weights  in  the  kernel  to 
integer  values  so  that  all  calculations  were  performed  by  integer  arithmetic.  This 
may  be  achieved  by  redistributing  the  weights  slightly.  In  this  example,  the  fraction 
of  the  positive  weights  often  is  concentrated  in  the  center  pixel  to  produce  the  Prewitt 
corner  detector : 


to  [n,  to]  = 


Note  that  that  the  upper-right  corner  detector  contains  a  bipolar  pattern  that  looks 
like  a  lower- left  corner  because  of  the  rotation  (“reversal”)  inherent  in  the  convolution. 
Because  to  is  bipolar,  so  generally  is  the  output  of  the  convolution  with  the  input 
/  [n,  to]  .  The  linearity  of  convolution  ensures  that  the  output  amplitude  at  a  pixel  is 
proportional  to  the  contrast  of  the  feature. 

If  the  contrast  of  the  upper-right  corner  is  large  and  “positive,”  meaning  that 
the  corner  is  much  brighter  than  the  dark  background,  the  output  at  the  corner 
pixel  will  be  a  large  and  positive  extremum.  Conversely,  a  dark  object  on  a  very 
bright  background  will  produce  a  large  negative  extremum.  The  magnitude  of  the 
image  shows  the  locations  of  features  with  either  contrast.  The  output  image  may  be 
thresholded  to  specify  the  pixels  located  at  the  desired  feature. 

This  method  of  feature  detection  is  not  ideal.  The  output  of  this  unamplified  filter 
at  a  corner  is  the  autocorrelation  of  the  feature  rather  than  the  ideal  2-D  discrete  Dirac 
delta  function.  If  multiple  copies  of  the  pattern  with  different  contrasts  are  present 
in  the  input,  it  will  be  difficult  or  impossible  to  segment  the  desired  features  by 
thresholding  the  convolution  alone.  Another  consequence  of  the  unamplified  matched 
filter  is  that  features  other  than  the  desired  pattern  produce  nonnull  outputs,  as 
shown  in  the  output  of  the  corner  detector  applied  to  a  test  object  consisting  of  “E” 
at  two  different  amplitudes  as  shown  in  the  figure.  The  threshold  properly  locates  the 
upper-right  corners  of  the  bright  “E”  and  one  point  on  the  sampled  circle,  but  misses 
the  corners  of  the  fainter  “E”.  This  shows  that  corners  of  some  objects  are  missed 
(false  negatives).  If  the  threshold  were  set  at  a  lower  level  to  detect  the  corner  of  the 
fainter  “E”,  other  pixels  will  be  incorrectly  identified  as  corners  (false  positives).  A 
simple  method  for  reducing  misidentified  pixels  is  considered  in  the  next  section. 
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Thresholding  to  locate  features  in  the  image:  (a)  f  [n,  m],  which  is  the  nonnegative 
function  with  0  <  /  <  1;  (b)  f  [■ ra,  to]  convolved  with  the  “upper-right  corner 
detector”,  producing  the  bipolar  output  g  [n,m\  where  —  5  <  g  <  4.  The  largest 
amplitudes  occur  at  the  upper-right,  corners,  as  shown  in  the  image  thresholded  at 
level  4,  shown  in  (c)  along  with  the  “ghost”  of  the  original  image.  This 
demonstrates  that  the  upper-right  corners  of  the  high-contrast  “E”  and  of  the  circle 
were  detected,  but  comer  of  the  low-contrast  “E”  was  missed. 
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16.2.13  Normalization  of  Contrast  of  Detected  Features 


The  recipe  just  developed  allows  creation  of  kernels  for  detecting  pixels  in  neighbor¬ 
hoods  that  are  “similar”  to  some  desired  pattern.  However,  the  sensitivity  of  the 
process  to  feature  contrast  that  was  also  demonstrated  can  significantly  limit  its  use¬ 
fulness.  A  simple  modification  to  “normalize”  the  correlation  measure  can  improve 
the  classification  significantly.  Ernest  Hall  defined  the  normalized  correlation  measure 
R  [n,  to]  : 


R[n,m  = 


f[n,  to]  *  h[n,  to 


IY1  (/tn’  n ’ 


n,m 


n,m 


where  the  sums  in  the  denominator  are  over  ONLY  the  elements  of  the  kernel.  The 
sum  of  the  squares  of  the  elements  of  the  kernel  h  [n,  to]  results  in  a  constant  scale 
factor  k  and  may  be  ignored,  thus  producing  the  formula: 


R[n,  to 


f[n,m 


h[n,  to 


/ 


where  again  the  summation  is  ONLY  over  the  size  of  the  kernel.  In  words,  this 
operation  divides  the  convolution  by  the  geometric  sum  of  gray  levels  under  the 
kernel  and  by  the  geometrical  sum  of  the  elements  of  the  kernel.  The  modification 
to  the  filter  makes  the  entire  process  shift  variant  and  thus  may  not  be  performed  by 
a  simple  convolution.  The  denominator  may  be  computed  by  convolving  (/  [n,  to])2 
with  a  uniform  averaging  kernel  of  the  same  size  as  the  original  kernel  h  [n,  to]  and 
then  evaluating  the  square  root 

/  [n,  to]  *  h  [n,  to 
yjtf  [n,m]f  *  s[n,m\ 

The  upper-right  corner  detector  with  normalization  is  shown  in  the  figure,  where  the 
features  of  both  “E”  s  are  located  with  a  single  threshold. 


R  [n,  to]  =  k 
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The  action  of  the  nonlinear  normalization  of  detected  features  using  the  same  object: 

(a)  f  [n,m]  convolved  with  the  upper-right  corner  detector  with  normalization  by  the 
image  amplitude,  producing  the  bipolar  output  g  [n,m\  where  —0.60  <  g  <  +0.75; 

(b)  image  after  thresholding  at  +0.6,  with  a  “ghostv  of  the  original  image,  showing 
the  detection  of  the  upper-right  corners  of  both  “ E”s  despite  the  different  image 

contrasts. 


16.3  Nonlinear  Filters 


Any  filtering  operation  that  cannot  be  evaluated  as  a  convolution  must  be  either 
nonlinear  or  space  variant,  or  both. 


16.3.1  Median  Filter 

Probably  the  most  useful  nonlinear  statistical  filter  is  the  local  median,  i.e. ,  the  gray 
value  of  the  output  pixel  is  the  median  of  the  gray  values  in  a  neighborhood,  which 
is  obtained  by  sorting  the  gray  values  in  numerical  order  and  selecting  the  middle 
value.  To  illustrate,  consider  the  3x3  neighborhood  centered  on  the  value  “3”  and 
the  9  values  sorted  in  numerical  order;  the  median  value  of  “2”  is  indicated  by  the 
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box  and  replaces  the  “3”  in  the  center  of  the  window: 
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The  nonlinear  nature  of  the  median  can  be  recognized  by  noting  that  the  median  of 
the  sum  of  two  images  is  generally  not  equal  to  the  sum  of  the  medians.  For  example, 
the  median  of  a  second  3x3  neighborhood  is  “3” 
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The  sum  of  the  two  medians  is  2  +  3  =  5,  but  the  sum  of  the  two  3x3  neighborhoods 
produces  a  third  neighborhood  whose  median  of  “6” : 
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confirming  that  the  median  of  the  sum  is  not  the  sum  of  the  medians  and  that  the 
median  is  a  nonlinear  operation. 

The  median  requires  sorting  and  thus  may  not  be  computed  as  a  convolution.  Its 
computation  typically  requires  more  time  than  a  mean  filter,  but  it  has  the  advantage 
of  reducing  the  modulation  of  signals  that  vary  or  oscillate  over  a  period  less  than  the 
width  of  the  window  while  preserving  the  gray  values  of  signals  which  are  constant  or 
monatonically  varying  on  a  scale  larger  than  the  window  size.  This  implies  that  the 
variance  of  additive  noise  will  be  reduced  by  the  median  in  a  fashion  similar  to  the 
mean  filter,  while  preserving  sharp  transitions  in  gray  value.  Also  note  that,  unlike 
the  mean  filter,  all  gray  values  generated  by  the  median  exist  in  the  original  image, 
thus  obviating  the  need  for  requantization. 

The  statistics  of  the  median-filtered  image  depend  on  the  probability  density  func¬ 
tion  of  the  input  signal,  including  the  deterministic  part  and  any  noise.  Thus  predic¬ 
tions  of  the  effect  of  the  filter  cannot  be  as  specific  as  for  the  mean  filter,  i.e.,  given 
an  input  image  with  known  statistics  (mean,  variance,  etc.),  the  statistics  of  the  out¬ 
put  image  are  more  difficult  to  predict.  However,  Frieden  [?,  Probability,  Statistical 
Optics,  and  Data  Testing,  Springer- Verlag,  1983,  pp.  254-258.]  has  analyzed  the 
statistical  properties  of  the  median  filter  by  modeling  it  as  a  limit  of  a  large  number 
of  discrete  trials  of  a  binomial  probability  distribution  (Bernouilli  trials).  The  median 
of  N  samples  (odd  number)  for  a  set  of  gray  values  ft  taken  from  an  input  distribu- 
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tion  with  probability  law  (i.e.  histogram)  pf  [a;]  must  be  determined.  Frieden  applied 
the  principles  of  Bernoulli  trials  to  determine  the  probability  density  of  the  median 
of  several  independent  sets  of  numbers.  In  other  words,  he  sought  to  determine  the 
probability  that  the  median  of  the  N  numbers  {/,,}  is  x  by  evaluating  the  median 
of  many  independent  such  sets  of  N  numbers  selected  from  a  known  probability  dis¬ 
tribution  [>f  [or] .  Frieden  reasoned  that,  for  each  placement  of  the  median  window,  a 
specific  amplitude  fn  of  the  N  values  is  the  median  if  three  conditions  are  satisfied: 


1.  one  of  the  N  numbers  satisfies  the  condition  x  <  fn  <  x  +  Ax 

2.  of  the  remaining  N  —  1  numbers,  exceed  x,  and 

3.  0f  the  remaining  numbers  are  less  than  x. 


The  probability  of  the  simultaneous  occurence  of  these  three  events  is  the  proba¬ 
bility  density  of  the  output  of  the  median  window.  For  an  arbitrary  x,  any  one  value 
fn  must  either  lie  in  the  interval  (x  <  f  <  x  +  Ax),  be  larger  than  x,  or  less  than 
x.  In  other  words,  each  trial  has  three  possible  outcomes.  These  conditions  define  a 
sequence  of  Bernoulli  trials  with  three  outcomes,  which  is  akin  to  the  task  of  flipping 
a  “three-sided”  coin  where  the  probabilities  of  the  three  outcomes  are  not  equal.  In 
the  more  familiar  case,  the  probability  that  N  coin  flips  with  two  possible  outcomes 
that  have  associated  probability  p  and  q  will  produce  to  “successes”  (say,  m  heads) 
is: 


PN  [to 


N\ 

(N  —  to)!to! 


pm(l-p) 


N—m 


The  formula  is  easy  to  extend  to  the  more  general  case  of  three  possible  outcomes; 
the  probability  that  the  result  yields  mi  instances  of  the  first  possible  outcome  (say, 
“head  #1),  m2  of  the  second  outcome  (“head  #2”)  and  m3  =  N  —  (mi  +  m2)  of  the 
third  (“tails”)  is 


PN  [mi,  m2,  m3]  =  PN  [mx,  m2,  N  —  (mi  +  m2)} 

m 

toi!to2!  (iV  —  (toi  +  m2))!^1 

=  _ _ nminm2  (^  -  (m  4-  n, 

toi!to2!  (iV  —  (toi  +  to2))!  1  2 


where  pi,  p2,  and  p3  =  1  —  (pi  +p2)  are  the  respective  probabilities  of  the  three 
outcomes. 

When  applied  to  one  sample  of  data,  the  median  filter  has  three  possible  outcomes 
whose  probabilities  are  known: 


1.  the  sample  amplitude  may  be  the  median  (probability  pi), 

2.  the  sample  amplitude  may  be  smaller  than  the  median  (probability  p2),  and 

3.  it  may  be  larger  than  the  median  (probability  pf). 
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pi  =  P  [x  <  fn  <  x  +  Aa:]  =  pf  [a;] 
P‘2  =  P  [fn  <  X]  =  Cf  [; X ] 

P3  =  P  [fn  >  x\  =  1  -  Cf  [a] 


where  Cf  [a;]  is  the  cumulative  probability  distribution  of  the  continuous  probability 
density  function  Pf[x\: 

/X 

pf  [a]  da 

-OO 

In  this  case,  the  distibutions  are  continuous  (rather  than  discrete),  so  the  probability 
is  the  product  of  the  probability  density  function  pmed  M  and  the  infinitesmal  element 
dx.  We  substitute  the  known  probabilities  and  the  known  number  of  occurences  of 
each  into  the  Bernoulli  formula  for  three  outcomes: 


Pmed  X  dx 


N\ 


(Cf  [a;])  2  •  (1  —  Cf  [a:])  2  •  pf  [a;]  dx 


llifiyf  H)^  '  l1  -  Ct  Pf  [a']  dx 


If  the  window  includes  N  =  3,  5,  or  9  values,  the  following  probability  laws  for  the 
median  result: 


n  I 

N  =  3  ==>  pmed  [a;]  dx  =  ^2  (Cf  N)1 ' 0- ~  Cf[A)~  '  Pf  M  dx 

=  6 (Cf  N)  •  (1  -  Cf  [a;])  pf  [a;]  dx 

N  =  5  ==>  Pmed  [x]  dx  =  ^2  ( Cf  N)2  •  [!  -  Cf  t^’]]2  Pf  [®]  dx 

=  30 (Cf  [a;])2  •  (1  -  Cf  [a;])2  pf  [a;]  dx 
N  =  9  ==>•  Pmed  [a:]  dx  =  630 (Cf  [a;])4  •  (1  —  Cf  [a;])4  pf  [a;]  dx 


Example:  Median  Filter  Applied  to  Uniform  Distribution 


The  statistical  properties  of  the  median  will  now  be  demonstrated  for  some  simple 
examples  of  known  probabilities.  If  the  original  pdf  pf  [a;]  is  uniform  over  the  interval 
[0, 1],  then  it  may  be  written  as  a  rectangle  function: 


pf  [a;]  =  RECT 
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x 

pdf  of  noise  that  is  uniformly  distributed  over  the  interval  [0, 1]  and  its  associated 
cumulative  probability  distribution  Fc  [x]  =  x  ■  RECT  [a;  —  |]  +  STEP  [x  —  1] 


The  associated  cumulative  probability  distribution  may  be  written  in  several  ways, 
including: 


cf  M 


x  ■  RECT 


+  STEP  [x 


1] 


so  the  product  of  the  cumulative  distribution  and  its  complement  is  windowed  by  the 
rectangle  to  yield: 


Pmed  h"  dx 


N\ 


N- 1 

X  2 


(1 


■  X) 


RECT 


(6¥)')2 

The  pdfs  of  the  output  of  median  filters  for  N  =  3,  5,  and  9 
N  =  3  =^>  pmedian  M  dx  =  6  (x  ~  X2)  RECT 
N  =  5  Pmedian  [#]  dx  =  30  (a;4  —  2a’3  +  a’2)  RECT 


1 

^+2 


dx 


are: 


1 

X  ~  2 


dx 


x  — 


dx 


N  =  9 


Pmedian  [^]  dx  =  630  •  (a8  —  4a7  +  6a6  —  4a5  +  a4)  RECT 


1 

X  ~  2 


dx 


are  compared  to  the  pdfs  of  the  output  of  the  mean  filters  in  the  figure: 
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Comparison  of  pdfs  of  mean  and  median  filter  for  uniform  probability  density 
function  pf  [ir]  =  RECT  \x  +  for  N  =  3,  5,  and  9.  Note  that  the  pdf  of  the  mean 
filter  is  “taller”  and  “ skinnier ”  in  all  three  cases ,  showing  that  it  will  reduce  the 

variance  more  than  the  median  filter. 


Just  like  the  mean  filter,  the  maximum  value  of  pmedicm  M  increases  and  its  “width” 
decreases  as  the  number  of  input  values  in  the  median  window  increases  (as  N  |). 
The  calculated  pdfs  for  the  median  and  mean  filters  over  N  =  3  and  N  =  5  samples 
for  input  values  from  a  uniform  probability  distribution  are  shown  below  to  the  same 
scale.  Note  that  the  output  distributions  from  the  mean  filter  are  taller  than  for  the 
median,  which  indicates  that  the  median  filter  does  a  poorer  job  over  averaging  noise 
than  the  mean  (Frieden  determined  that  the  SNR  of  the  median  filter  is  smaller  than 
that  of  the  mean  by  a  factor  of  loge  [2]  =  0.69,  so  that  there  is  a  penalty  in  SNR  of 


about  30%  for  the  median  filter  relative  to  the  averaging  filter.  Put  another  way,  the 
standard  deviation  ot  the  median  of  N  samples  decreases  as  h/N  ■  log,  [2])  T 


instead  of  as  ^=.  The  lesser  noise  reduction  of  the  median  filter  is  offset  by  its  ability 
to  preserve  the  sharpness  of  edges. 
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Comparison  of  mean  and  median  filter:  (a)  bitonal  object  f  [m]  defined  over  1000 
samples;  (b )  mean  of  f  [m]  over  25  samples,  showing  reduction  in  contrast  with 
increasing  frequency;  (c)  median  of  f  [to]  over  25  samples,  which  is  identical  to 
f  [to];  (d)  f  [to]  +  n  [to],  which  is  uniformly  distributed  over  interval  [0, 1];  (e)  mean 
over  25  samples;  (f)  median  over  25  samples.  Note  that  the  highest-frequency  bars 
are  better  preserved  by  the  median  filter. 


Example:  Median  Filter  Applied  to  Gaussian  Noise 


Probably  the  most  important  application  of  the  median  filter  is  to  attenuate  Gaussian 
noise  (i.e. ,  the  gray  values  are  selected  from  a  normal  distribution  with  zero  mean) 
without  blurring  edges.  The  central  limit  theorem  indicates  that  the  statistical  char¬ 
acter  of  noise  which  has  been  generated  by  summing  random  variables  from  different 
distributions  will  be  Gaussian  in  character.  The  probability  distribution  function  is 
the  Gaussian  with  mean  value  //  and  variance  a2  normalized  to  unit  area: 


pf  [a; 


1 

\Z‘Itxo2 


exp 


(t  ~  /02 

2(j2 
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The  cumulative  probability  density  of  this  noise  is  the  integral  of  the  Gaussian  prob¬ 
ability  law,  which  is  proportional  to  the  error  function: 


erf  [xl  = 


e  dt 


We  can  evaluate  the  cumulative  density  in  terms  of  erf  [x] : 


1 


cf  M  =  /  Pc  M  dx  =  - 


1 


'  — oo 

PX 


2  2V2a 


erf 


Cf  [x]  =  /  pc  [x]  dx  =  -  H - erf 


x  —  /J 
y/2  a  _ 
x  —  /i 


for  x  <  // 
for  x  >  /i 


2  2^f2o  L  y/2 <r 

Therefore  the  probabilities  of  the  different  outcomes  of  the  median  filter  are: 


Pmed  [•-£’]  dx 


N\ 


((^i )!)2  V2  2V2, 


1  1 


erf 


la 


x  —  /J 
\[2o  . 


N- 1 
2 


T  1 


2  2\[2o 


erf 


1 


\[2t\g 


exp 


x 


2cr2 


X  —  p 

\f2o 
dx 


N—l 

2 


The  error  function  is  compiled  and  may  be  evaluated  to  plot  the  probability  pmed.  [#] 


X 

pdf  of  Gaussian  noise  with  p  =  l,  a  =  2  (black)  and  of  the  median  for  N  =  3  (red), 

N  =  9  (blue). 

The  graphs  illustrate  the  theoretical  averaging  effects  of  the  mean  and  median  filters 
on  Gaussian  noise.  The  graphs  are  plotted  on  the  same  scale  and  show  the  pdf  of  the 
original  Gaussian  noise  (on  the  left)  and  the  output  resulting  from  mean  and  median 
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filtering  over  3  pixels  (center)  and  after  mean  and  median  filtering  over  5  pixels  (right). 
The  calculated  mean  gray  value  and  standard  deviation  for  2048  samples  of  filtered 
Gaussian  noise  yielded  the  following  values: 

IMn  =  0.211 

a  in  =  4.011 
/i3  —  mean  =  0.211 
<r3  —  mean  =  2.355 
/i3  —  median  =  0.225 
(T3  —  median  =  2.745 


Effect  of  Window  “Shape”  on  Median  Filter 

In  the  2-D  imaging  case,  the  shape  of  the  window  over  which  the  median  is  computed 
also  affects  the  output  image.  For  example,  if  the  2-D  median  is  computed  over  a 
5x5  window  at  the  upper-right  corner  of  a  dark  object  on  a  bright  background,  the 
median  will  be  the  background  value: 


median  of 
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The  median  calculated  over  a  full  square  window  (3  x  3,  etc.)  will  convert  bright 
pixels  at  outside  corners  of  bright  object  to  dark  pixels,  i.e. ,  the  corners  will  be  clipped; 
it  will  also  convert  a  dark  background  pixel  at  the  inside  corner  of  a  bright  object 
to  a  bright  pixel.  It  will  also  eliminate  lines  less  than  half  as  wide  as  the  window. 
Corner  clipping  may  be  prevented  by  computing  the  median  over  a  window  that  only 
includes  9  values  arrayed  along  horizontal  and  vertical  lines: 
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If  applied  to  the  pixel  in  the  cornder,  we  obtain 


median  of 
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This  pattern  also  is  effective  when  applied  to  thin  lines  without  elimating  them: 
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Other  patterns  of  medians  are  also  useful  [?,  Castleman,  Digital  Image  Processing, 
Prentice- Hall,  1996,  p.  249]. 

16.3.2  Other  Statistical  Filters  (Mode,  Variance,  Maximum, 
Minimum) 

The  statistical  mode  in  the  window  (i.e.,  the  most  common  gray  level)  is  a  useful  op¬ 
erator  on  binary  images  corrupted  by  isolated  noise  pixels  ( “salt-and-pepper  noise”). 
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The  mode  is  found  by  computing  a  mini-histogram  of  pixels  within  the  window  and 
assigning  the  most  common  gray  level  to  the  center  pixel.  Rules  must  be  defined  if 
two  or  more  gray  levels  are  equally  common,  and  particularly  if  all  levels  are  popu¬ 
lated  by  a  single  pixel.  If  two  levels  are  equally  populated,  the  gray  level  of  center 
pixel  is  usually  retained  if  it  is  one  of  those  levels;  otherwise  one  of  the  most  common 
gray  levels  may  be  selected  at  random. 

The  variance  filter  a2  and  standard  deviation  filter  a  replace  the  center  pixel  with 
the  variance  or  standard  deviation  of  the  pixels  in  the  window,  respectively.  The 
variance  filtering  operation  is 

g[x,y]  =  (/M  _/-02 

window 

where  //  is  the  mean  value  of  pixels  in  the  window.  The  output  of  a  variance  or 
standard  deviation  operation  will  be  larger  in  areas  where  the  image  is  busy  and 
small  where  the  image  is  smooth.  The  output  of  the  rx-filter  resembles  that  of  the 
isotropic  Laplacian,  which  computes  the  difference  of  the  center  pixel  and  the  average 
of  the  eight  nearest  neighbors. 

The  Maximum  or  Minimum  filter  obviously  replace  the  gray  value  in  the  center 
with  the  highest  or  lowest  value  in  the  window.  The  MAX  filter  will  dilate  bright 
objects,  while  the  MIN  filter  erodes  them.  These  provide  the  basis  for  the  so-called 
morphological  operators.  A  “dilation”  (MAX)  followed  by  an  “erosion”  (MIN)  de¬ 
fines  the  morphological  “CLOSE”  operation,  while  the  opposite  (erosion  followed  by 
dilation)  is  an  “OPEN”  operation.  The  “CLOSE”  operation  fills  gaps  in  lines  and 
removes  isolated  dark  pixels,  while  OPENING  removes  thin  lines  and  isolated  bright 
pixels.  These  nonlinear  operations  are  useful  for  object  size  classification  and  distance 
measurements 


16.4  Adaptive  Operators 

In  applications  such  as  edge  enhancement  or  segmentation,  it  is  often  useful  to 
“change”,  or  “adapt”  the  operator  based  on  conditions  in  the  image.  One  exam¬ 
ple  has  already  been  considered:  the  nonlinear  normalization  used  while  convolving 
with  a  bipolar  convolution  kernel.  For  another  example,  it  is  possible  to  enhance 
differences  in  the  direction  of  the  local  gradient  (e.g.  via  a  1-D  Laplacian)  while  aver¬ 
aging  in  the  orthogonal  direction.  In  other  words,  the  operator  used  to  enhance  the 
edge  information  is  determined  by  the  output  of  the  gradient  operator.  As  another 
example,  the  size  of  an  averaging  neighborhood  could  be  varied  based  on  the  statistics 
(e.g.,  the  variance)  of  gray  levels  in  the  neighborhood. 

In  some  sense,  these  adaptive  operators  resemble  cascaded  convolutions,  but  the 
resulting  operation  is  not  space  invariant  and  may  not  be  desribed  by  convolution 
with  a  single  kernel.  By  judicious  choice  of  algorithm,  significant  improvement  of 
image  quality  may  be  obtained. 
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16.5  Convolution  Revisited  —  Bandpass  Filters 

The  parameters  of  a  filter  that  determine  its  effect  on  the  image  are  the  size  of  the 
kernel  and  the  algebraic  sign  of  its  coefficients.  Kernels  whose  elements  have  the 
same  algebraic  sign  are  lowpass  filters  that  compute  spatial  averages  and  attenuate 
the  modulation  of  spatial  structure  in  the  image.  The  larger  the  kernel,  the  greater  the 
attenuation.  On  the  other  hand,  kernels  that  compute  differences  of  neighboring  gray 
levels  will  enhance  the  modulation  of  spatially  varying  structure  while  attenuating 
the  brightness  of  constant  areas.  Note  that  the  largest  number  of  elements  in  a  kernel 
with  different  algebraic  signs  is  two;  the  spatial  first  derivative  is  an  example. 

We  will  now  construct  a  hybrid  of  these  two  extreme  cases  that  will  attenuate  the 
modulation  of  image  structure  that  varies  more  slowly  or  rapidly  than  some  selectable 
rate.  In  other  words,  the  filter  will  pass  a  band  of  spatial  frequencies  and  attenuate 
the  rest  of  the  spectrum;  this  is  a  bandpass  filter.  The  bandpass  filter  will  compute 
differences  of  spatial  averages  of  gray  level.  For  example,  consider  a  1-D  image: 


The  spatial  frequencies  of  the  cosines  are: 


^o  =  ^  =  0  cycles  per  sample  =>•  X()  =  oo 
£i  =  Y^g  —  7.8  •  1CT2  cycles  per  sample  =>•  Xi  =  128  samples 
£2  =  ^4  cycles  per  sample  =>  X2  =  64  samples 
£3  =  3^  cycles  per  sample  =>  X3  =  32  samples 

This  function  is  periodic  over  128  samples,  which  is  a  common  multiple  of  all  of 
the  finite-period  cosines.  The  extreme  amplitudes  are  +5  and  +0.2466.  Consider 
convolution  of  /  [n]  with  several  kernels;  the  first  set  are  3-pixel  averagers  whose 
weights  sum  to  unity,  therefore  preserving  the  mean  gray  level  of  /  [n] : 


Obviously,  hi[n]  is  the  identity  kernel,  h2  is  a  tapered  averager  that  applies  more 
weight  to  the  center  pixel,  while  h3  is  a  uniform  averager.  Based  on  our  experience 
with  averaging  filters,  we  know  that  g3  [n]  =  /  [n]  *  hi  [n]  must  be  identical  to  f  [n] ,  while 
the  modulation  of  the  output  from  h2[n]  will  be  reduced  a  bit  in  g2  and  somewhat 
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more  in  g:i .  This  expectation  is  confirmed  by  the  computed  maximum  and  minimum 
values: 


/max  5 

(0l)max  =  5 

fe)m«  =  4.987 

fe)m«  =  4.983 

/min  =  0.2466 

(j?l)min  =  0-2466 

fe)min  =  0.2564 

fe)min  =  0.2596 

The  mean  gray  values  of  these  images  are  identical: 

</)  =  (91)  =  (92)  =  {g3)  =  2 


We  can  define  a  contrast  factor  based  on  these  maximum  and  minimum  values  that 
is  analogous  to  the  modulation,  except  that  the  image  is  not  sinusoidal: 


cf  = 


/max  /n 
/max  fn 


The  corresponding  factors  are: 


Cf  =  0.906 

ci  =  0.906 

c2  =  0.9022 

c3  =  0.9019 

which  confirms  the  expectation. 

Now  consider  three  5-pixel  averagers: 


/&4  [n] 
h5  [n] 
h  [n] 


0 


0 


+1 


+5 

+- 

~  9 

+3 

+- 
~  9 

+1 

+1 

+1 

+1 

+1 

Again,  h4  is  the  identity  kernel  that  reproduces  the  modulation  of  the  original  im¬ 
age,  h‘2  is  a  tapered  averager,  and  he  is  a  uniform  averager.  The  computed  maximum 
and  minimum  values  for  the  images  are: 


/max  5 

(^)max  =  5 

(»5)mm  =  4.967 

(»«)„«  =  4-950 

/min  =  0.2466 

(»4)mln  =  0.2466 

feU  =  +0.272 

S!  0.285 

</)=  2 

(94)  =  2 

( 95 )  =  2 

(ge)  =  2 

Cf  =  0.906 

C4  =  0.906 

C5  =  0.896 

ce  =  0.891 

The  average  over  a  larger  number  of  samples  reduces  the  modulation  further  but  does 
not  affect  the  mean  gray  values. 
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n 


n 


Outputs  from  application  of  various  averaging  kernels  to  f  [:r] .  At  this  scale,  the 

differences  are  not  noticeable. 

Now  consider  a  kernel  that  is  significantly  wider: 


h7  [n]  =  A7  cos 


27 ro 

~64~ 


■RECT 


"  n  " 
.32. 


where  the  scale  factor  A7  is  usually  used  to  normalize  h7  [n]  to  unit  area.  The  RECT 
function  limits  the  support  of  the  cosine  to  a  finite  width  of  32  pixels,  which  is  half 
the  period  of  the  cosine  function.  The  kernel  is  wide  and  nonnegative  ;  this  is  another 
example  of  a  tapered  averaging  kernel  that  weights  nearby  pixels  more  heavily  than 
more  distant  pixels.  The  corresponding  uniform  averaging  kernel  is: 

I  r  7i  " 

h8  [n]  =  —RECT  — 

8L  J  32  L32J 

To  simplify  comparison  of  the  results  of  h7  and  h8,  we  will  set  A7  =  T  instead  of 
a  factor  that  ensures  a  unit  area.  The  exact  value  of  the  scale  factor  will  affect 
the  output  amplitudes  and  not  the  modulation.  Based  on  the  experience  gained 
for  h\,h2,  ■  ■  ■  ,  he,  we  expect  that  both  h7  and  h8  will  diminish  the  modulation  of 
spatially  varying  patterns,  and  that  h8  will  have  the  larger  effect.  In  fact,  because 
the  width  of  h8  matches  the  period  X:i  =  32,  this  cosine  term  will  be  attenuated  to 
null  amplitude.  The  effects  on  the  amplitudes  of  the  array  are: 


/max  5 

-  2.585 

(9S)m«  =  3.536 

/min  =  0.2466 

(»T)mln  =  0.593 

teU,  ^  +1-205 

</>=  2 

(97)  =  2 

<08  >  =  2 

Cf  =  0.906 

c7  =  0.627 

c8  ^  0.492 

which  again  confirms  the  expectation  that  the  uniform  averager  reduces  the  contrast 
to  a  greater  degree  than  the  tapered  averager,  but  neither  affects  the  mean  gray  value. 
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(a) 


n 


n 


Outputs  from  filters  of  with  32,  which  are  still  averaging  filters. 


If  the  width  of  the  RECT  function  in  the  tapered  averaging  kernel  h7  is  increased 
still  further  while  the  period  of  the  constituent  cosine  function  is  retained,  the  result¬ 
ing  kernel  includes  some  negative  weights.  For  example: 


hg  [n]  =  Ag  cos 


RECT 


'  n ' 
.48. 


The  constant  Ag  may  be  chosen  so  that  the  area  of  hg  is  unity  (Ag  =  0.06948),  or  Ag 
may  be  set  to  -k.  matching  the  normalization  factor  for  the  uniform  averager: 


hw  [n] 


RECT 

48 


'  n ' 
-48. 


which  simplifies  comparison  of  the  resulting  amplitudes.  Kernel  hg  computes  the  same 
weighted  average  as  h7  in  the  neighborhood  of  the  pixel,  but  then  subtracts  a  weighted 
average  of  distant  pixels  from  it;  it  computes  differences  of  average  amplitudes.  The 
effects  of  these  operations  on  the  extrema  are: 


/max  5 

(s.)™  =  1-532 

(9io)m«  =  2.940 

fmin  =  0.2466 

(9»)mi„  =  0-118 

(9io)m,„  =  +1-420 

(/)=  2 

(g9)  0.5997 

(dio)  =  2 

Cf  =  0.906 

Cg  ^  0.857 

cio  =  0.349 

Kernel  h  1 0  retained  the  mean  value  but  further  attenuated  the  contrast  by  pushing  the 
amplitudes  toward  the  mean.  However,  the  difference-of-averages  kernel  hg  actually 
increased  the  contrast  and  decreased  the  mean  value. 
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(a) 


n 


n 


Outputs  from  filters  with  impulse  responses  hg  [n]  =  A RECT  [||]  •  cos  [2tt||]  ,  which 

now  has  some  negative  amplitudes. 


This  trend  may  be  continued  by  increasing  the  width  of  the  RECT  and  using 
equal  scale  factors: 


/Mi  [n]  =  —  cos 
64 


h12  [n]  =  — RECT 


2vr— 

RECT 

'  n  ' 

L  64  J 

L  64  J 

n 


L64 


Because  the  width  of  the  RECT  matches  the  period  of  the  cosine  in  hi,  it  may  not 
Ire  normalized  to  unit  area.  The  extrema  of  these  two  processes  are: 


/max  5 

fell)™*  =  0-712 

fei2)m„  =  2  +  l  =  2.637 

/min  =  0.2466 

felOmin  =  -0-5H 

(taU  =  +1-364 

(/)  =  2 


<Su>  =  0 


(. 9i2 )  —  2 


Cf  =  0.906 


cii  “  6.08  (?!!) 


ci2  =  0.318 


The  uniform  averager  hi  continues  to  push  the  amplitudes  toward  the  mean  value  of 
2  and  decreases  the  contrast,  while  the  mean  amplitude  generated  by  the  difference 
of  averages  kernel  is  now  zero,  which  means  that  the  minimum  is  less  than  zero. 

Note  that  the  output  gn  [n]  looks  like  the  kernel  hn  [n];  in  other  words,  the  portion 
of  /  [n]  that  was  transmitted  to  gn  [n]  largely  is  a  cosine  of  period  64.  The  distortion 
is  due  to  cosines  at  other  frequencies. 
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(a) 


n 


n 


Outputs  from  convolutions  with  hn  [n]  =  gO RECT  [^]  •  cos  [27r^|]  and 
h±2  [n]  =  ^ RECT  [j|].  The  bipolar  impulse  response  has  zero  area  and  “blocks”  the 

constant  part  of  the  signal. 


Now,  consider  filters  whose  widths  are  equal  to  the  period  of  /  [n] : 


h  13  [n]  =  cos 


hu  [n]  =  y^RECT 


2n— 

•  —RECT 

■  n  ' 

l  64  J 

128 

im\ 

n 

L 128 


The  figures  of  merit  for  the  gray  values  of  these  arrays  are: 


/max  5 

(9is)max  =  0.5 

(9u)max  =  +2 

fmin  =  0.2466 

(9is)mto  =  -05 

(//14)  min  =  +2 

</>=  2 

(^13)  =  0 

(gu)  =  2 

Cf  =  0.906 

C13  =  OO 

O 

O 

The  output  of  the  bipolar  kernel  his  is  a  sinusoid  with  period  64  and  zero  mean,  while 
that  of  the  averager  hu  is  the  constant  average  value  of  /  [n] .  Note  that  the  contrast 
parameter  cis  is  undefined  because  fmin  =  — fmax ,  while  cu  =  0. 
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(a) 


n 


n 


Outputs  resulting  from  convolution  with  hi3  =  p^RECT  [^]  •  cos  [27t||]  and 
hu  =  RECT  ;  the  former  is  a  bandpass  filter  and  the  latter  is  a  lowpass 

filter. 


To  summarize,  kernels  that  compute  differences  of  averages  are  wide  and  bipolar, 
and  typically  yield  bipolar  outputs.  As  the  width  of  a  difference-of-averages  kernel  is 
increased,  the  output  resembles  the  kernel  itself  to  a  greater  degree,  which  is  bipolar 
with  zero  mean.  On  the  other  hand,  increasing  the  width  of  an  averaging  opera¬ 
tor  results  in  outputs  that  approach  a  constant  amplitude  (the  average  value  of  the 
input);  this  constant  is  a  cosine  with  infinite  period.  The  difference-of-averages  oper¬ 
ator  rejects  BOTH  slowly  and  rapidly  varying  sinusoids,  and  preferentially  passes  a 
particular  sinusoidal  frequency  or  band  of  frequencies.  Thus,  differences  of  averages 
operators  are  called  bandpass  filters. 


Kernels  of  bandpass  filters  are  wide,  bipolar,  and  resemble  the  signal  to  be  detected. 


16.6  Implementation  of  Filtering 

16.6.1  Nonlinear  and  Shift- Variant  Filtering 

Special-purpose  hardware  (array  processors)  is  readily  available  for  implementing  lin¬ 
ear  operations.  These  contain  several  memory  planes  and  can  store  multiple  copies 
of  the  input.  By  shifting  the  addresses  of  the  pixels  in  a  plane  and  multiplying  the 
values  by  a  constant,  the  appropriate  shifted  and  weighted  image  can  be  generated. 
These  are  then  summed  to  obtain  the  filtered  output.  A  complete  convolution  can  be 
computed  in  a  few  clock  cycles. 

Other  than  the  mean  filter,  the  statistical  filters  are  nonlinear,  i.e.,  the  gray  value 
of  the  output  pixel  is  obtained  from  those  of  the  input  pixel  by  some  method  other 
than  multiplication  by  weights  and  summing.  In  other  words,  they  are  not  convo¬ 
lutions  and  thus  cannot  be  specified  by  a  kernel.  Similarly,  operations  that  may  be 
linear  (output  is  a  sum  of  weighted  inputs)  may  use  different  weights  at  different 
locations  in  the  image.  Such  operations  are  shift-variant  and  must  be  specified  by 
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different  kernels  at  different  locations.  Nonlinear  and  shift- variant  operations  are 
computationally  intensive  and  thus  slower  to  perform  unless  special  single-purpose 
hardware  is  used.  However,  as  computer  speeds  increase  and  as  juices  fall,  this  is  be¬ 
coming  less  of  a  problem.  Because  of  the  flexibility  of  operations  possible,  nonlinear 
shift-variant  filtering  is  a  very  active  research  area. 


16.7  Neighborhood  Operations  on  Multiple  Images 

16.7.1  Image  Sequence  Processing 

It  should  now  be  obvious  that  we  can  combine  the  gray  levels  in  neighborhoods  of  the 
input  pixel  in  multiple  images  to  obtain  the  output  image  g[x,  y] .  The  multiple  copies 
of  f\x,y] may  have  been  spread  over  time  (e.g.  video),  over  wavelength  (e.g.  RGB 
images),  or  some  other  parameter.  In  these  cases,  the  image  and  kernel  are  functions 
of  three  coordinates. 


f[x,y,tJ  f[x.y.y  f[x,y,tJ 


g[x.y] 


Schematic  of  neighborhood  operation  on  multiple  images  that  differ  in  some  other 
characteristic  (time,  wavelength,  or  focus  depth). 


16.7.2  Spectral  -f-  Spatial  Neighborhood  Operations 

Additive  noise  is  a  common  corrupter  of  digital  images  and  may  disrupt  classifica¬ 
tion  algorithms  based  on  gray-level  differences,  e.g.  in  multispectral  differencing  to 
segment  remote-sensing  images.  The  noise  can  be  attenuated  by  combining  spatial 
averaging  and  spectral  differencing,  i.e. 

9  [x,  y]  =  f  [x,  y,  Ai]  *  h.  [x,  y]  -  f  [x,  y,  X2]  *  h  [x,  y] 
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where 

V\  =  g 


Since  convolution  and  subtraction  are  linear,  the  order  of  operations  can  be  inter¬ 
changed: 

9  [x,  y]  =  (f  [x,  y,  Ai]  -  /  [a;,  y ,  A2]  )*h[x,y\ 

These  can  be  combined  into  a  single  3-D  operation  using  a  3-D  kernel  h  [x,  y.  A] 


+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

g[x,  y]  =  f[x,  y,  Ai]  *  h[x,  y,  A i 


where 


h  'x.y.  A i 


Chapter  17 

Shape-Based  Operations 


An  shape-based  operation  identifies  or  acts  on  groups  of  pixels  that  belong  to  the 
same  object  or  image  component.  We  have  already  seen  how  components  may  be 
identified  on  the  basis  of  pixel  gray  level,  color  (multispectral  gray  level),  or  features 
such  as  edges,  corners,  texture,  etc..  The  output  of  an  shape-based  operation  may 
not  be  an  image  at  all,  but  rather  a  description  of  the  objects,  locations,  etc.  in  the 
image. 

Applications: 

•  image  segmentation 

•  image  description 

•  image  compression 

One  application  of  both  point  and  local  neighborhood  operations  is  image  seg¬ 
mentation,  i.e.,  classification  of  pixels  that  belong  in  one  group.  In  the  simple  cases 
considered  thus  far,  pixels  were  segmented  by  identifying  clusters  in  a  feature  space 
(e.g.,  the  multidimensional  color  histogram).  Clustered  pixels  are  assumed  to  belong 
to  the  same  object  class.  This  approach  is  appropriate  in  multispectral  classfication 
(e.g.,  ground-cover  classification  in  remote  sensing)  or  when  regions  may  be  distin¬ 
guished  by  convolution  with  a  particular  kernel  (e.g.,  patterned  or  textured  regions). 
However,  certain  important  problems  may  be  more  readily  attacked  by  other  means. 
For  example,  suppose  that  we  want  to  build  a  machine  vision  system  to  read  ideal 
binary  text  (i.e.,  black  text  on  white  background);  individual  characters  or  words 
must  be  segmented  and  identified.  The  histogram  feature-space  approach  is  not  ap¬ 
propriate  because  all  text  pixels  have  the  same  gray  level.  Pixel  classification  (i.e., 
assigning  membership  of  pixels  to  specific  letters  or  words)  must  be  based  on  a  dif¬ 
ferent  criterion  such  as  the  shape  of  a  group  of  adjoining  pixels  with  similar  gray 
levels. 

This  type  of  problem  leads  naturally  to  the  concept  of  shape-based  processing, 
where  groups  of  pixels  (objects)  are  processed  as  a  whole.  In  the  text  processing  ex¬ 
ample,  pixels  that  belong  to  an  individual  text  character  are  connected,  i.e.,  adjacent 
pixels  of  the  same  gray  level  belong  to  the  same  character,  while  pixels  with  the  same 
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gray  level  that  are  not  connected  belong  to  different  characters.  In  this  discussion, 
we  will  ignore  problems  with  overdots  (e.g.,  in  lower  case  i  and  j).  which  would  be 
identified  as  separate  characters. 

Shape-based  processing  is  also  used  for  symbolic  image  description.  Once  pix¬ 
els  have  been  identified  as  connected,  simpler  shape  descriptors  may  be  derived  by 
structural  operations  such  as  thinning  to  find  the  object  skeleton,  border-following, 
computation  of  moments  of  objects,  and  morphological  transformations.  The  discus¬ 
sions  of  these  operations  will  assume  that  the  images  are  binary;  the  principles  may 
be  applied  to  gray-level  imagery  with  some  difficulty. 

17.1  Pixel  Connectivity 

In  a  binary  image,  the  component  objects  are  clusters  of  foreground  pixels  (white)  on 
a  background  (black).  Two  pixels  with  the  same  gray  level  that  are  adjacent  or  that 
are  linked  through  an  unbroken  chain  of  foreground  pixels  are  said  to  be  connected 
and  are  assumed  to  belong  to  the  same  object.  Unconnected  foreground  pixels  must 
belong  to  different  objects.  The  concepts  of  connectivity  and  adjacency  must  be  well 
defined  before  a  program  may  be  written  to  identify  and  distinguish  the  foreground 
objects  in  the  image. 

A  pixel  on  a  rectangular  grid  are  said  to  be  four-connected  or  eight-connected  when 
it  has  the  same  properties  (gray  level)  as  one  of  its  nearest  four  or  eight  neighbors. 


A 

B 

Definition  of  pixel  connectivity:  the  four  dark  pixels  are  ‘four- connected”  to  pixel  A; 
the  eight  dark  pixels  are  “eight- connected”  to  pixel  B. 

Pixel  A  is  4-connected  and  Pixel  B  is  8-connected  to  dark  pixelsSome  questions 
may  arise  about  whether  some  groups  of  connected  pixels  are  connected  to  other 
groups.  The  picture  below  is  composed  of  dark  foreground  objects  on  a  bright  back¬ 
ground.  How  many  dark  objects  are  there?  Regardless  of  the  connectivity  of  the 
foreground  and  background,  the  dark  pixels  on  the  right  compose  a  single  foreground 
object  which  divides  the  background  into  two  distinct  regions.  The  number  of  fore¬ 
ground  objects  on  the  left  depends  on  the  definition  of  connectivity.  If  both  foreground 
and  background  are  considered  to  be  4-connected,  the  dark  foreground  pixels  on  the 
left  actually  compose  four  distinct  objects  and  the  background  has  two  distinct  com¬ 
ponents.  Of  course,  if  there  are  four  distinct  4-connected  foreground  objects,  then 
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all  background  pixels  should  belong  to  a  single  object.  Thus  if  the  foreground  is  4- 
connected,  the  background  should  be  8-connected.  If  the  foreground  is  8-connected, 
then  there  is  a  single  foreground  object  on  the  left  which  divides  the  background  into 
two  components  (inside  and  outside),  and  thus  the  background  should  be  4-connected. 

.If  both  foreground  and  background  are  considered  to  be  8-connected,  then  the 
four  clusters  A,  B,C,  and  D  are  one  foreground  object,  and  E  and  F  comprise  a 
single  background  object.  Perceptually,  however,  E  would  be  considered  to  be  a  hole 
in  the  foreground  object.  If  the  foreground  and  background  are  both  considered  4- 
connected,  the  clusters  A,  B,  C,and  D  are  separate  foreground  objects  and  E  and  F  are 
separate  background  objects.  The  foreground  and  background  are  usually  assumed 
to  have  complementary  connectivity,  e.g.,  8-connected  foreground  and  4-connected 
background. 


The  complementarity  of  connectivity;  if  the  “foreground”  object  is  4-connected,  the 
“background”  object  is  assumed  to  be  8-connected,  and  vice  versa.  This  assumption 
ensures  that  the  gray  pixels  on  the  left,  form  one  object  if  8  connected,  so  that  the 
white  pixels  “inside”  and  “outside”  the  object  are  not  connected. 


17.2  Image  Labeling 

The  membership  of  a  pixel  is  specified  by  assigning  a  specific  label  to  all  pixels 
belonging  to  that  component.  Algorithms  have  been  developed  to  automatically 
assign  labels  and  the  labeled  clusters  may  then  be  trivially  segmented  by  histogram 
thresholding.  One  simple  technique  requires  two  row-by-row  scans  of  a  binary  image. 
When  a  foreground  pixel  is  encountered  during  the  first  scan,  its  neighbors  of  each 
pixel  are  examined  as  shown: 
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A 

B 

C 

Pixel  labeling;  the  shaded  pixel  is  to  be  labeled. 


The  shaded  pixel  is  to  be  labeled;  if  the  object  is  assumed  to  be  4-connected,  the 
already  assigned  labels  of  pixels  B  and  C  are  checked.  If  either  is  also  foreground, 
then  the  shaded  pixel  must  belong  to  the  same  foreground  object  and  receives  the 
same  label.  If  neither  B  nor  C  is  foreground,  then  the  shaded  pixel  is  assigned  a  new 
label.  If  both  B  and  C  are  foreground,  but  with  different  labels,  then  the  shaded 
pixel  connects  two  previously  unconnected  regions;  it  is  assigned  one  of  the  labels 
and  the  equivalence  of  the  two  labels  is  noted  in  an  equivalence  table.  During  the 
second  scan,  all  equivalent  labels  are  redefined  to  generate  the  final  labeled  image.  If 
the  foreground  is  assumed  to  be  8-connected,  the  same  procedure  is  followed,  but  the 
label  of  pixel  A  is  also  considered  when  labels  are  assigned. 


After  labeling  the  components,  image  segmentation  is  quite  trivial;  different  labels 
may  be  used  like  different  gray  levels  to  segment  the  image  by  thresholding. 
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17.2.1  Example: 


Bitonal  Input  Image  to  be  Labeled 


Four  Connectivity 


Eight  Connectivity 


The  pixels  that  divide  the  fore-  and  background  define  the  borders  of  objects  in 
the  image,  and  are  very  useful  for  pattern  recognition  and  image  compression.  The 
border  may  be  defined  as  a  pixel  or  as  the  line  between  the  pixels  in  one  of  three 
ways: 

It  is  often  useful  to  label  the  border  pixels  of  objects  in  the  image,  which  may  be 
defined  in  several  ways.  Recall  that  the  connectivity  of  the  foreground  and  background 
must  be  complementary  (4-C  foreground  =>  8-C  background).  The  border  may  be 
defined  as: 

1.  pixels  belonging  to  the  foreground  object  that  are  adjacent  to  background  pixels, 
or 

2.  pixels  belonging  to  the  background  that  are  adjacent  to  foreground  pixels,  or 

3.  the  crack  between  foreground  and  background. 

The  borders  must  define  a  closed  curve,  which  be  specified  by  its  beginning  (e.g., 
upper  left  corner)  and  the  pixel-by-pixel  direction  around  the  object.  A  common 
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notation  for  specifying  the  border  code  is  to  assign  a  direction  code  at  each  pixel 
using  this  numbering  system: 


3 

2 

1 

4 

* 

0 

5 

6 

7 

where  the  asterisk  indicates  the  edge  pixel  being  examined.  This  window  is  placed 
at  the  location  of  the  upper-left-most  edge  pixel  in  the  image  of  the  edge.  The 
number  specifies  the  direction  to  the  next  pixel  around  the  edge  in  a  counterclockwise 
direction.  The  next  edge  pixel  encountered  around  this  loop  is  encoded  by  the  number; 
the  code  is  easily  remembered;  the  mth  pixel  is  in  the  direction  45°  •  m.  The  border 
code  must  be  consistent  with  the  connectivity  of  the  region;  if  the  border  is  defined 
by  4-connected  foreground  pixels,  then  only  codes  0,  2,  4,  and  6  are  allowed  and  only 

2  bits  are  required  to  store  the  code;  if  8-connected,  all  8directions  are  allowed  and 

3  bits  are  needed.  The  border  code  is  often  called  a  chain  code.  If  using  definition 
[3] ,  the  four  directions  of  the  resulting  crack  code  are  specified  in  an  identical  fashion 
and  requires  2  bits  per  segment. 

Border  coding  is  a  useful  tool  in  image  recognition  and  compression,  i.e.,  it  reduces 
the  number  of  bits  required  to  store/transmit  the  image.  Binary  foreground  objects 
may  be  completely  specified  by  the  border  codes  with  fewer  bits  than  storing  the 
locations  of  each  foreground  pixel. 


17.3  Border  Operations 

17.3.1  Contraction/erosion  and  expansion/dilation 

Bright  objects  foreground  in  binary  images  may  be  shrunk  by  deleting  (turning  to 
black)  the  border  pixels  of  the  foreground  (as  defined  in  [1]  above)  or  expanded  by 
converting  the  border  pixels  of  the  background  to  white  (foreground).  These  two 
operations  (erosion  and  dilation)  are  the  basic  components  of  image  morphology. 
This  operations  may  be  applied  to  a  number  of  imaging  problems,  including  noise 
removal,  image  compression,  and  image  specification.  This  field  of  image  processing 
has  generated  recent  interest  because  of  its  applications  to  machine  vision  problems. 

Most  morphological  operations  are  based  on  the  two  basic  operations  of  erosion 
and  dilation.  In  the  simplest  cases,  erosion  transforms  all  object  pixels  that  are 
adjacent  to  background  pixels  to  background.  Dilation  is  the  opposite;  background 
pixels  adjacent  to  object  pixels  are  transformed  to  object  pixels.  In  the  more  general 
case,  a  shape  function  p  [x,  y]  is  applied  to  objects  in  the  image  and  determines  which 
pixels  would  be  converted.  The  shape  function  is  often  called  the  structuring  element 
or  probe,  and  must  specify  the  origin  of  coordinates,  i.e.,  the  pixel  p[ 0,  0].  The  erosion 
of  the  image  is  the  set  of  pixels  defining  those  locations  of  p[0,0]  for  which  p[x,  y\ 
fits  wholly  within  the  image  object.  The  dilation  of  the  image  is  the  set  of  pixels 
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defined  by  the  locations  of  the  coordinate  origin  of  the  structure  element  where  the 
element  touches  any  object  pixels  in  the  image.  The  generalized  erosion  and  dilation 
are  identical  to  the  simple  definitions  if  the  structure  element  is  defined  as  the  single 
pixel  at  the  origin. 


In  the  remainder  of  this  discussion,  we  will  denote  the  erosion  of  an  image  by  the 
symbol  “El”,  so  that  the  erosion  of  f[x,y\  by  the  “probe”  function  p\x,y]  is: 

/  [x,  y]  B  p  [x,  y] , 

The  dilation  will  be  represented  by  the  symbol  “EB” ,  so  that  the  dilation  of  /  [x,  y]  by 
the  “probe”  function  p  [ x ,  y\  is: 


/  [x,  y]  B  p  [x,  y] . 

In  general,  the  operation  of  dilation  commutes  but  erosion  does  not,  i.e. , 

/  [x,  y]  B  p  [x,  y]  =p[x,y\Sf  [x,  y] 
f  [x,  y]Bp  [x,  y]^p  [x,  y\Bf  [x,  y] 


The  morphological  dilation  and  erosion  operations  are  shift-invariant,  because  a 
translation  of  the  object  results  in  an  identical  translation  of  the  erosion  or  dilation. 
However,  they  are  not  linear,  i.e.,  the  erosion  of  the  sum  of  two  images  is  not  the 
sum  of  the  erosions,  and  thus  may  not  be  computed  by  convolutions,  but  rather  by 
determining  if  each  pixel  satisfies  the  requisite  properties  for  the  operation.  For  this 
reason,  morphological  operations  tend  to  be  computationally  intensive,  to  the  point 
where  hours  of  CPU  time  may  be  required  on  general-purpose  computers.  However, 
they  may  be  quickly  computed  in  the  binary  case  by  evaluating  the  gray-scale  cross¬ 
correlation  of  the  binary  image  and  the  binary  “probe”  function  (sometimes  called  the 
“structuring  element”)  The  gray-scale  crosscorrelation  is  thresholded;  the  dilation  is 
obtained  by  thresholding  just  above  the  minimum  value  and  erosion  by  thresholding 
at  a  level  just  below  the  maximum. 
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Erosion 


1-D  illustration  of  erosion  and  dilation  of  bitonal  object  via  correlation:  (a)  f  [n]; 
(b)  “probe”  function  p  [n\;  (c)  g  [n]  =  f  [n]  *  p  [— n]  =  f  [n]  ★p  [n];  (d)  erosion 
obtained  by  thresholding  g  [n]  just  below  maximum;  (e)  dilation  by  thresholding  g  [n] 

just  above  minimum. 


17.4  Cascaded  Morphological  Operations  —  “Open¬ 
ing”  and  “Closing” 

The  general  erosion  and  dilation  operators  shrink  and  expand  bright  foreground  ob¬ 
jects.  They  are  near-inverses  of  each  other,  but  if  an  erosion  removes  an  object 
consisting  of  an  isolated  foreground  pixel,  a  subsequent  dilation  will  not  recreate  the 
original  object.  Because  they  are  not  exact  inverses,  the  cascade  of  erosion  followed 
by  dilation  will  yield  a  different  result  than  a  dilation  followed  by  erosion.  The  former 
is  the  morphological  opening  of  the  image  because  it  will  open  up  regions  which  are 
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smaller  than  the  structuring  element.  The  opening  operation  smooths  the  boundary 
of  bright  binary  objects  while  breaking  apart  any  objects  connected  by  thin  lines.  The 
cascade  of  dilation-erosion  is  the  morphological  closing  of  the  image  and  will  fill  in 
areas  of  the  foreground  which  are  smaller  than  the  structuring  element.  The  closing 
operator  fills  up  holes  within  or  spaces  between  bright  objects  in  the  image. 

Opening  =  (/  [x,  y]  B  p[x,y\)  Bp  [x,  y\ 

Closing  =  (/  [x,  y\  EB  p  [x,  y])Bp  [x,  y] 

The  opening  and  closing  operations  may  be  further  cascaded  or  combined  in  various 
ways  to  perform  useful  operations.  For  example,  the  difference  between  the  set  of 
pixels  that  belong  to  the  object  and  the  set  resulting  from  erosion  with  an  averaging- 
like  element  will  yield  the  boundary  pixels  of  the  object: 


Boundary  =  (/  [x,  y]  B  p  [re,  y])  —  f  [re,  y]  for  p  [re,  y)  = 


+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

The  so-called  “hit-or-miss”  operator  is  obtained  by  computing  the  erosion  of  the 
image  and  the  dilation  of  the  complementary  image,  and  then  finding  the  intersection 
of  the  two: 


Hit-or-Miss  =  (/  [re,  y\Bp  [re,  y})  D  ((1  -  /  [re,  y})  B  p  [re,  y\) 

=  (f[x,y\Bp[x,y\)  D  (/  [x,y]  Bp[i,j/]J 

where  /  denotes  the  complement  image  to  /  [re,?/],  i.e., 

f  [x,  y]  =  1  -  f  [x,  y] 

The  concepts  of  morphological  operations  are  presently  being  extended  to  gray-level 
(non-binary)  images. 


17.5  Applications  of  Morphological  Operations 

17.5.1  Noise  Removal 

Additive  noise  in  a  binary  image  is  often  called  salt-and-pepper;  dark  pixels  are 
sprinkled  in  the  white  foreground  and  white  pixels  in  the  dark  background.  Isolated 
dark  pixels  may  be  removed  by  sequentially  expanding  and  contracting  borders  of  the 
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foreground;  the  complementary  cycle  will  remove  isolated  white  pixels. 
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After  Subsequent  Dilation 


17.5.2  Medial  Axis  Transform 

When  subjected  to  repeated  erosion  operations,  the  objects  in  the  image  will  gradually 
shrink  to  the  point  where  any  further  erosions  will  cut  the  object  into  discontinuous 
parts.  The  image  obtained  by  assigning  the  number  of  erosions  required  to  reach  that 
point  to  each  remaining  pixel  is  the  medial  axis  of  the  object,  and  the  ensemble  of 
those  pixels  is  the  skeleton  of  the  object.  These  images  are  very  useful  for  identifying 
components  of  the  image  and/or  their  orientation,  i.e.,  image  representation,  and  to 
reduce  the  number  of  data  bits  necessary  to  specify  the  representation,  i.e.,  image 
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compression. 


Original  Bit-onal  Image 
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Shrinking  and  expansion  operations  are  special  cases  of  a  more  general  field  of 
image  processing  operations  known  as  morphological  operations,  which  specify  the 
pixels  that  are  inscribed  by  or  circumscribed  by  other  user-defined  shapes  (probes  or 
structure  elements). 
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17.6  Binary  Morphological  Operations 


17.6.1  Horizontal  Structuring  Element 


F  H  E|  IF  .  E 

HECl  ll  E  [ 

E  I  3  I  I  E  I  3 


f[n,m]  Pi[n,m]  f[n,m]  *  pj[n,m] 


Opening 


Dilation 


FIE 
I  E  C 
E  I  3 


Closing 


Bitonal  morphological  erosion,  dilation,  opening  (dilation  of  erosion),  and  closing 
( erosion  of  dilation)  for  horizontal  “structuring  element”  (probe  function).  Note 
that  the  horizontal  “spaces”  in  the  characters  are  filled  in  the  closing. 
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17.6.2 


Vertical  Structuring  Element 
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Opening  closing 

Bitonal  morphological  erosion,  dilation,  opening  (dilation  of  erosion),  and  closing 
(erosion  of  dilation)  for  vertical  “structuring  element”  (probe  function).  Note  that 
the  vertical  “spaces”  in  the  characters  are  filled  in  the  closing. 


Thin  horizontal  structures  are  removed  by  opening  with  vertical  structure  element. 
Vertical  holes  in  object  are  filled  by  closing  with  vertical  structure  element 


Chapter  18 

Geometric  Operations 


To  this  point,  the  image  processing  operations  have  computed  the  gray  value  (digital 
count)  of  the  output  image  pixel  based  on  the  gray  values  of  one  or  more  input  pixels; 
in  other  words,  the  operation  “changed”  the  gray  value  to  something  new.  Geometri¬ 
cal  image  processing  operations  are  fundamentally  different;  instead  of  modifying  the 
gray  values  of  pixels,  they  redefine  the  pixel  locations  without  changing  their  values 
(except  to  interpolate  the  values  to  the  new  pixel  grid).  In  other  words,  geometrical 
operations  change  the  spatial  relationships  between  image  pixels  to  correct  distortions 
due  to  recording  geometry,  scale  changes,  rotations,  perspective  (keystoning),  or  due 
to  curved  or  irregular  object  surfaces.  In  this  section,  we  will  define  the  procedures 
for  specifying  and  implementing  a  range  of  geometrical  operations. 


INPUT  OUTPUT 


In  theory,  it  is  possible  (though  exhausting!)  to  describe  geometric  transforma¬ 
tions  via  a  lookup  table  of  input/output  coordinates,  i.e.,  the  table  would  specify  new 
output  coordinates  if]  for  each  input  location  [x,y\.  Equivalently,  the  coordinate 
lookup  table  could  specify  the  input 

coordinates  [x.  y]  that  map  to  a  specific  output  pixel  [x1,  y'].  For  an  N  x  M  image, 
such  a  lookup  table  would  contain  N  ■  M  ordered  pairs  (  ==>  262, 144  pairs  for  a 
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512  x  512  image).  The  lookup  table  can  specify  any  arbitrary  geometric  transform, 
i.e. ,  input  pixels  in  the  same  neighborhood  may  move  to  locations  that  are  very  far 
apart  in  the  output  image.  Besides  using  large  blocks  of  computer  memory,  coordi¬ 
nate  lookup  tables  are  more  general  than  usually  necessary.  In  realistic  applications, 
neighboring  pixels  in  the  input  image  will  usually  remain  in  close  proximity  in  the 
output  image,  i.e.,  their  coordinates  will  be  transformed  in  similar  manner.  Such  co¬ 
ordinate  mappings  are  also  called  rubber-sheet  transformations  or  image  warping  and 
may  be  specified  by  a  set  of  parametric  equations  that  specify  the  output  coordinates 
for  a  given  input  position: 

x'  =  a  [x,  y\ 

y'  =  P  [x,  y] 

=>-  0{f[x,y}}  =  f  [x\  y'}  =  f  [a  [x,  y] ,  p  [ x ,  y}} . 

The  gray  level  /  at  the  location  [x,  y\  is  transferred  to  the  new  coordinates  [x\  y']  to 
create  the  output  image  f[x',y'].  This  sequence  of  operations  equivalent  to  defining 
a  “new”  image  g  [x,  y\  in  the  same  coordinates  \x,  y]  but  with  different  gray  levels; 
hence: 

9[x,y]  =  f[a[x,y\,p[x,y]\. 


To  preserve  the  arrangement  of  pixel  gray  values  in  neighborhoods,  the  transfor¬ 
mation  of  the  continuous  coordinates  should  be  continuous,  i.e.,  the  derivative  must 
be  finite  everywhere.  A  power  series  of  the  input  coordinates  for  positive  powers 
satisfies  this  requirement: 

OO  OO 

EE  ^ nm  xnym  —  cioo  +  a10x  +  aoi  y  +  on  xy  +  CI20X2  +  •  •  ■ 

n= 0  m= 0 

OO  OO 

EE  bnmXnym  —  fyo  +  bwx  +  boiy  +  bnxy  +  b2ox2  +  •  •  • 

n=0  m= 0 

In  practice,  the  infinite  series  is  truncated,  thus  limiting  the  range  of  possible  trans¬ 
formations.  Under  many  conditions,  only  four  terms  are  necessary  in  each  expression, 
i.e.,  anm  and  bnm  =  0  for  n,  m  >  2: 


x  —  ciqo  +  ci10x  +  amy  +  duxy 
\i  =  b0  o  +  b10x  +  b01y  +  buxy 


Such  a  transformation  is  called  bilinear.  There  are  eight  unknown  coefficients  in  the 
transformation,  and  thus  eight  independent  equations  are  needed  to  find  a  solution. 
Knowledge  of  the  coordinate  transformation  of  the  vertices  of  a  quadrilateral  is  suf¬ 
ficient  to  find  a  solution  for  this  transformation.  In  other  words,  knowledge  of  the 
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action  on  four  locations 


[xi,yi]  ->•  K ,y[] 

[®2, 2/a]  ->•  [a?2>  2/2] 

[aT3, 2/3]  ->•  [a?3, 2/3] 

[®4,  Va]  ->•  [X4,  ?/4] 

will  allow  calculation  of  the  eight  coefficients.  The  problem  may  be  cast  in  matrix 
notation  where  the  known  inputs  [xi,yi\  and  outputs  are  arranged  as  column 

vectors  in  the  matrices  X  and  Xh  respectively,  and  the  unknown  coefficients  a,j  and 
bij  are  the  rows  of  the  matrix  A  in  the  expression: 

A  •  X  =  X' 


1 

1 

1 

1 

© 

0 

0 

Oio 

a  01 

an 

Xi 

X2 

X4 

x'l 

x2 

x3 

x4 

0 

0 

r-O 

_ 1 

bio 

boi 

611 

yi 

2/2 

2/3 

2/4 

_  2/i 

2/2 

2/3 

2/4  _ 

xm 

X2V2 

CO 

CO 

1 

If  X  is  square  (and  if  no  three  of  the  known  points  lie  on  the  same  straight  line), 
then  the  inverse  matrix  X  1  exists;  the  coefficients  al3  and  6,?  may  be  found  via  a 
straightforward  calculation: 

(A  •  X.)  X-1  =  X'  •  X"1 

=*►  I  A  =  X'  •  X^1 


The  four  known  vertices  in  the  input  and  output  images  are  sometimes  called  “control 
points”.  More  control  points  may  be  used  in  a  bilinear  fit,  thus  making  the  problem 
“overdetermined”  (more  equations  than  unknowns).  A  unique  solution  of  an  overde¬ 
termined  problem  may  not  exist  if  there  is  uncertainty  (“noise”)  in  the  data.  Under 
such  conditions,  either  a  least-squares  solution  is  generated  or  the  control  points  are 
applied  locally  to  find  local  transformations  for  different  sections  of  the  image.  If  the 
distortion  cannot  be  adequately  represented  by  a  power  series  with  eight  coefficents, 
then  more  than  four  control  points  are  required. 


g[x,y] 


control  points 
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18.1  Least-Squares  Solution  for  Warping 

The  procedure  for  computing  the  least-squares  solution  for  the  coefficients  of  a  geo¬ 
metric  transformation  is  quite  easy  in  matrix  notation.  For  example,  consider  a  the 
geometric  transformation  which  adds  a  constant  coordinate  translation  and  magnifi¬ 
cations  in  the  orthogonal  directions.  The  coordinate  equations  are: 

x'  =  °oo  +  a10x  +  amy 

y'  =  froo  +  biox  +  boiy 

where  the  coefficients  op-  and  b,:j  must  be  calculated.  The  system  of  equations  has 
six  unknown  quantities  and  so  requires  six  equations  to  obtain  a  solution.  Since  each 
control  point  (input-output  coordinate  pair)  yields  equations  for  both  .rand  y.  three 
control  points  are  needed.  If  more  control  points  (say  five)  are  available  and  consistent, 
the  extras  may  be  ignored  and  the  the  matrix  inverse  computed  as  before.  If  the 
positions  of  the  control  points  are  uncertain,  the  equations  will  be  inconsistent  and 
the  matrix  inverse  will  not  exist.  Under  these  conditions,  the  additional  control  points 
will  improve  the  estimate  of  the  transformation.  The  computation  of  the  coefficients 
which  minimizes  the  squared  error  in  the  transformation  is  called  the  least-squares 
solution,  and  is  easily  computed  using  matrix  algebra  as  a  pseudoinverse. 

If  we  have  five  known  control  point  pairs,  the  matrix  transformation  is  composed 
of  the  2  row  by  3  column  matrix  A,  the  3  row  by  5  column  matrix  X  and  the  2  row 
by  5  column  matrix  X7  : 

A  •  X  =  X' 

r  -]  11111  r  -I 

Ooo  a  10  Ooi  x'\  x2  x3  x4  x5 

Xi  X2  X3  X4  X5  = 

b  00  b10  boi  y[  y'2  y'3  y'4  y'F) 

[y  1  2/2  2/3  2/4  2/5  J 

If  X  were  square  (and  invertible),  we  would  compute  X  'to  find  A  via: 

A  =  X'  •  X^1 

However,  X  is  NOT  square  in  this  case,  and  thus  its  inverse  X  1  does  not  exist.  We 
may  evaluate  a  pseudoinverse  of  X  that  implements  a  least-squares  solution  for  A 
via  the  following  steps  for  the  the  general  case  where  X  has  p  rows  and  q  columns 
(p  =  3  and  q  =  5  in  the  example  above). 

1.  multiply  both  sides  of  the  equation  from  the  right  by  the  transpose  matrix  X1  > 
which  is  obtained  from  X  by  exchanging  the  rows  and  columns  to  obtain  a 
matrix  with  q  rows  and  p  columns.  The  result  is: 

( A  •  X)  •  XT  =  X'  •  XT 


18.1  LEAST-SQUARES  SOLUTION  FOR  WARPING 


453 


2.  The  associativity  of  vector  multiplication  allows  the  second  and  third  matrices 
on  the  left-hand  side  to  be  multiplied  first: 

( A  •  X)  •  XT  =  A  •  (X  •  XT) 


3.  The  matrix  X  •  X  r  is  p  x  p  square.  In  the  example  above,  the  product  of  the 
two  matrices  is  3  x  3  square: 


X  •  XT 


11111 


Xi  x2  x3  x4  x5 


1  Xi  1)1 

1  X2  y2 

1  ^3  1/3 


y  i  y  2  m  y±  y$ 


1  X4  |/4 
1  X5  y5 


5  Xi  +  x2  +  x3  +  x4  +  X5  yi  +  y2  +  y3  +  y4  +  y5 

X!+X2  +  X3  +  X4  +  x5  xf+X2  +  x3+xj  +  x%  x-^!  +  x2y2  +  x3y3  +  x4y4  +  x5y5 

yi  +  1/2  +  1/3  +  1/4  +  1/5  XiVi  +  x2y2  +  x3y3  +  x4y4  +  x5y5  y{  +  y\  +  +  y\  +  y\ 


Since  X«X7  is  square,  there  is  some  chance  that  its  inverse  (X  •  XT)  1  exists. 
If  so,  then  we  can  multiply  the  left-hand  side  from  the  right  by  this  inverse;  the 
result  is  the  desired  matrix  of  coefficients  A: 

A  •  (X  •  XT)  •  (X  •  X7)^1  =  A 


4.  If  we  perform  the  same  series  of  steps  on  the  right-hand  side,  we  obtain  the 
desired  formula: 


A  =  (X'.XT).(X.XTP 

=  x'.(xT.(x.xT)"1) 
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a00  «10  a01  Oil 
^00  ^10  ^01  ^11 


/ 

1  x1  yi  X12/1 

1  x2  2/2  £22/2 

1  £3  2/3  £32/3 

1  X4  2/4  £’42/4 

V 

1  £5  2/5  £52/5 

ryJ  ryJ  ryJ  ry^ 

X1  x2  x3  XA  xb 
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18.2  Common  Geometrical  Operations 


x'  =  x,y'  =  y  ==>  g  [a;, g]  =  /[a/, 2/r]  =  /  [£, 2/]  ==>■  identity  transformation,  no  change 
a;'  =  a;  +  x0,  2/'  =  y  +  2/0  g  [x,  y]  =  f[x  +  o00, 2/  +  &00]  =>-  translation  by  [o00,  60o] 
x  =  ax,  1/  =  by  ==>  g  [x,y\  =  /[aio£,  &oi2/]  ==^  spatial  stretching  or  scaling 
x'  =  x  +  ay,  y'  =  y  <7  [£,  2/]  =  /[£  +  aoi2/,  2/]  skew 
x'  =  x  +  axy,  y'  =  y  =>•  g  [x,  y]  =  f[x  +  auxy,  y]  =>•  perspective  distortion 


x'  =  a  [x,  y]  =  £  cos  6  —  y  sin  0 
y'  =  P  [x,  y\  =  x  sin  0  +  2/  cos  0 


rotation  thru  0 


[a;', ;//']  are  the  coordinates  of  the  input  image  that  are  mapped  to  [x,y]  in  the 
output  image. 


18.3  PIXEL  TRANSFERS 
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TRANSLATION  SCALING 


SKEW  PERSPECTIUE 


. . " . e 

ROTATION 


18.3  Pixel  Transfers 


As  already  mentioned,  a  geometrical  transformation  may  be  implemented  by  spec¬ 
ifying  the  where  each  pixel  of  the  input  image  is  located  in  the  output  grid,  or  by 
specifying  the  location  of  each  output  pixel  on  the  input  grid.  Except  for  certain  (and 
usually  uninteresting)  transformations,  pixels  of  the  input  image  will  rarely  map  ex¬ 
actly  to  pixels  of  the  output  grid.  It  is  therefore  necessary  to  interpolate  the  gray 
value  from  pixels  with  noninteger  coordinates  (i.e.,  nongrid  points)  to  pixels  with 
integer  coordinates  that  lie  upon  the  grid.  The  method  that  transfers  the  input  pixel 
coordinates  and  interpolates  the  gray  value  on  the  output  grid  is  called  pixel  carryover 
by  Castleman.  It  may  also  be  called  an  input-to- output  mapping.  The  algorithm  that 
locates  the  output  pixel  upon  the  input  grid  and  interpolates  the  gray  value  of  the 
input  pixels  is  called  pixel  filling  or  an  output-to-input  mapping 


PIXEL  CflRRVOUER  PIXEL  FILLING 


Input  f[x,y]  Output  gCxVy1]  Input  f[x,y]  Output  g[x y  1  ] 
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18.4  Pixel  Interpolation 


Gray- value  interpolation  is  based  on  the  relative  distances  of  the  nearest  neighbors  to 
or  from  the  geometrically  transformed  pixel.  In  the  simplest  case  (nearest-neighbor 
or  zeroth-order  interpolation) ,  all  of  the  gray  value  is  transferred  to  or  from  the  near¬ 
est  pixel.  However,  since  the  distances  of  the  four  neighbors  must  be  evaluated  to 
determine  which  is  closest,  it  generally  is  quite  easy  to  “upgrade”  nearest-neighbor 
calculations  to  bilinear  interpolation.  This  method  divides  the  gray  level  of  the  non¬ 
integer  pixel  among  its  four  nearest  “integer”  neighbors  in  inverse  proportion  to  the 
distance;  if  the  transferred  pixel  is  equidistant  from  the  four  neighbors,  then  its  gray 
value  is  divided  equally,  if  it  is  much  closer  to  one  of  the  four  grid  points,  most  of 
its  gray  value  is  transferred  to  that  pixel.  For  pixel  filling,  the  gray  value  at  the 
output  pixel  g  [x',y']  is  determined  from  the  following  equation,  where  xn.  yn  are  the 
coordinates  of  the  nearest  pixel  of  the  input  grid  to  the  transformed  pixel,  and  cln  are 
the  respective  distances: 

r  ,  n  _  £/l  [Xl,yi]  +  [x2,  2/2 ]  +  [x3,  2/3 ]  +  £/4  [^4,  2M] 

9  [x  iU  \  —  j_  |  j_  ,  j_  ,  j_ 

di  d2  ^  d3  +  d4 


Input  Grid 
Output  Grid 


Schematic  of  interpolation  of  pixel  gray  value.  The  input  and  output  grids  of  pixels 
are  shown  as  solid  and  dashed  lines,  respectively.  The  distances  of  the  output  pixel 
from  the  four  nearest  neighbors  are  labeled  d±  —  c/4. 


The  gray  level  of  the  transformed  pixel  can  be  divided  among  more  of  the  neigh¬ 
boring  pixels  by  using  higher-order  interpolation,  e.g.,  cubic  spline,  etc. 


18.4  PIXEL  INTERPOLATION 
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18.4.1  Example:  Image  Rotation 

The  examples  are  small  (64  x  64)  images  rotated  by  the  degree  increments  specified. 
The  images  have  been  interpolated  using  the  output-to-input  mapping  and  bilinear 
interpolation.  Note  that  artifacts  are  visible,  particularly  at  45°. 


30° 


45° 


Chapter  19 

Global  Operations 


If  a  pixel  in  the  output  image  g  is  a  function  of  (almost)  all  of  the  pixels  in  f  [x,  y ],  then 
0{f  [x,  y]}  is  a  global  operator.  This  category  includes  image  coordinate  transforma¬ 
tions,  of  which  the  most  important  is  the  Fourier  transform.  These  transformations 
derive  new,  usually  equivalent,  representations  of  images;  for  example,  the  Fourier 
transform  maps  from  the  familiar  coordinate-space  representation  f  [x,y\  to  a  new 
representation  (a  new  image)  whose  brightness  at  each  coordinate  describes  the  quan¬ 
tity  of  a  particular  sinusoidal  spatial  frequency  component  present  in  f\x,y].  The 
sum  of  the  component  sinusoids  is  the  original  image.  In  other  words,  the  Fourier 
transform  generates  the  frequency-space  representation  F  [£,77]  of  the  image  /  [x,y]. 
The  coordinates  of  the  image  [x,y]  have  dimensions  of  length  (e.g.,  mm)  while  the 
coordinates  of  the  frequency  representation  [£,77]  have  units  of  inverse  length  (e.g., 
S^p).  Global  gray- level  properties  of 

the  image  map  to  local  properties  in  the  Fourier  transform,  and  vice  versa.  The 
frequency-space  representation  is  useful  for  many  applications,  including  segmenta¬ 
tion,  coding,  noise  removal,  and  feature  classification.  It  also  provides  an  avenue  for 
performing  other  image  operations,  particularly  convolution.  Each  output  pixel  is  a 
function  of  the  gray  levels  of  all  input  pixels. 


19.1  Relationship  to  Neighborhood  Operations 

The  concept  of  a  linear  global  operator  is  a  simple  extension  of  that  of  the  linear  local 
neighborhood  operator.  In  that  case,  an  output  pixel  was  calculated  by  point-by- 
point  multiplication  of  pixels  in  the  input  image  by  a  set  of  weights  (the  kernel)  and 
summing  the  products.  The  convolution  at  different  pixels  is  computed  by  shifting 
the  kernel.  Recall  that  some  accommodation  must  be  made  for  cases  where  one  or 
more  elements  of  the  the  kernel  are  off  the  edge  of  the  image. 

In  the  case  of  a  global  operator,  the  set  of  weights  is  as  large  as  the  image  and 
constitutes  a  “mask  function” ,  say  q  [a;,  y] .  The  output  value  obtained  by  applying  a 
mask  q  [x,  y]  to  an  input  image  /  [a;,  y]  is: 


9  = 


f  [x,  y\  q  [x,  y]  dx  dy 
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In  the  discrete  case,  the  integral  becomes  a  summation: 

9  =  f  tn’ m]  q  K 

n  n 


Note  that  a  translation  of  the  mask  by  one  pixel  in  any  direction  shifts  some  of 
its  elements  over  the  edge  of  the  image.  If  we  assume  that  the  output  in  such  cases 
is  undefined,  only  a  single  output  pixel  is  calculated  from  one  mask  function  q  [n,  m\. 
In  general,  different  outputs  result  from  different  masks,  i.e.,  we  can  define  an  output 
pixel  by  using  different  masks  for  each  coordinate  pair  [x’,y’]: 

9  [M]  =  ^2  S  f  tn’  q  tn’  m'i  k ’  ^ 


Schematic  of  a  global  operation  evaluated  at  one  pixel  [x',y'].  The  input  image 
f  [x,  y\  is  multiplied  by  the  specific  “mask  function  ”  for  that  output  pixel;  the  product 
values  are  summed  to  compute  the  output  gray  value  g. 


19.2  DISCRETE  FOURIER  TRANSFORM  (DFT) 
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f lx, Vl 


m[x,  y;  gCxI.y,'] 


f  m  [x ,  y ;  x‘  ,y2‘]  g[x,2,y2,:i 

Global  operation  evaluated  at  two  locations  in  the  output;  different  “mask”  functions 
are  used  to  evaluate  the  values  at  the  different  locations. 


In  general,  the  coordinates  of  g  are  different  from  those  of  /,  and  often  even  have 
different  dimensions  (units).  The  action  of  the  operator  is  obviously  determined  by 
the  form  of  the  mask  function.  The  most  common  example  is  the  Fourier  transform, 
where  the  mask  function  is: 

q  [x,  v\  f ,  v]  = cos  l27T  +  w )]  -  *  sin  (&  +  m)]  =  exP  [-2^  (&c  +  yy)\ 


19.2  Discrete  Fourier  Transform  (DFT) 

If  the  input  signal  has  been  sampled  at  discrete  intervals  (of  width  Ax,  for  example), 
the  Fourier  integral  over  x  reduces  to  a  sum: 

+oo 

F  [£]  =  f  [n  •  Ax]  exp  [-2vri£  (n  •  Ax)] 


Recall  that  the  Whittaker-Shannon  sampling  theorem  states  that  a  sinusoidal  func¬ 
tion  must  be  sampled  at  a  rate  greater  than  than  two  samples  per  period  (Nyquist 
frequency)  to  avoid  aliasing.  Thus,  the  minimum  period  Amin  of  a  sampled  sinusoidal 
function  is  two  sample  intervals  (2  •  Ax  in  the  example  above),  which  implies  that  the 
maximum  spatial  frequency  in  the  sampled  signal  is: 

Cmax  =  £,Nyq  =  T7  =  X 
^-min  z 
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£max  is  measured  in  cycles  per  unit  length  (typically  cycles  per  millimeter).  Often  the 
absolute  scale  of  the  digital  image  is  not  important,  and  the  frequency  is  scaled  to 
Ax  =  1  pixel,  i.e. ,  the  maximum  spatial  frequency  is  |  The  range  of  meaningful 
spatial  frequencies  of  the  DFT  is 

If  the  input  function  f  [x]  is  limited  to  N  samples,  the  DFT  becomes  a  finite  sum: 


N—l 

F[^f  [n  ■  Ax]  exp  [-2yri£  (n  •  Ax)] 

n= 0 

N_ 

2 

or  /  [n  ■  Ax]  exp  [— 2ni^  ( n  ■  Ax)] 

n=-f~l 

The  DFT  of  a  1-D  sequence  of  N  samples  at  regular  intervals  Ax  can  be  computed 
at  any  spatial  frequency  £.  However,  it  is  usual  to  calculate  the  DFT  of  a  sequence 
of  frequencies  (e.g.,  a  total  M)  separated  by  a  constant  interval  A£.  Each  sample 
of  the  DFT  of  a  real  sequence  of  N  pixels  requires  that  N  values  each  of  the  cosine 
and  sine  be  computed,  followed  by  2 N  multiplications  and  2 N  sums,  i.e.,  of  the 
order  of  N  operations.  The  DFT  at  M  spatial  frequencies  requires  of  the  order  of 
M  ■  N  operations.  Often,  the  DFT  is  computed  at  N  frequencies,  thus  requiring  of 
the  order  of  N 2  operations.  This  intensity  of  computation  made  calculation  of  the 
DFT  a  tedious  and  rarely  performed  task  before  digital  computers.  For  example,  a 
Fourier  deconvolution  of  seismic  traces  for  petroleum  exploration  was  performed  by 
Enders  Robinson  in  1951;  it  took  the  whole  summer  to  do  32  traces  by  hand  with  a 
memoryless  mechanical  calculator.  This  task  could  now  be  done  with  the  cheapest  PC 
in  less  than  a  second.  Even  with  mainframe  digital  computers  into  the  1960s,  digital 
Fourier  analysis  was  unusual  because  of  the  computation  time.  In  1965,  J.W.  Cooley 
and  J.W.  Tukey  developed  the  Fast  Fourier  Transform  algorithm,  which  substantially 
cut  computation  times  and  made  digital  Fourier  analysis  feasible. 


19.3  Fast  Fourier  Transform  (FFT) 

The  FFT  was  developed  to  compute  discrete  Fourier  spectra  with  fewer  operations 
than  the  DFT  by  sacrificing  some  flexibility.  The  DFT  may  compute  the  amplitude 
of  sinusoidal  components  at  any  frequency  within  the  Nyquist  window,  i.e.,  the  DFT 
maps  discrete  coordinates  n  ■  Ax  to  a  continuous  set  of  frequencies  £  in  the  interval 
[~^Nyq,^Nyq]-  The  DFT  may  be  computed  at  a  single  spatial  frequency  if  desired. 
The  FFT  is  a  recursive  algorithm  that  calculates  the  spectrum  at  a  fixed  discrete  set 
of  frequencies  with  a  minimum  number  of  repetitive  calculations.  The  spectrum  must 
be  computed  at  all  frequencies  to  obtain  the  values  of  individual  spectral  components. 
In  the  FFT,  the  amplitudes  at  N  discrete  equally  spaced  frequencies  are  computed  in 
the  interval  [— £jvOT,  £jvj/g]  from  N  input  samples.  The  frequency  samples  are  indexed 


19.3  FAST  FOURIER  TRANSFORM  (FFT) 
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by  the  integer  k  and  the  interval  between  frequency  samples  is: 

Ae_1oe  1  2  _  1 

^  •  £Nyq  -  —  ■  -  N'^ 


^  k"A^  A-  Ax’ 
A  ■  Ax  •  A£  =  1 


A  ,  A 

- <  k  < - 1 

2  “  “2 


If  we  substitute  these  specific  frequencies  into  the  DFT: 

JV_1 

2 

F  [A:  •  A£]  =  ^  /  [n  •  Ax]  exp  [—27 d/c  •  A£  •  (n  •  Ax)] 

Ax 


"=- f 


/  [n  •  Ax]  exp 


— 27t ikn  ■ 


A  ■  Ax 


but  Ac;  = 


1 


N  ■  Ax 


F  [k  ■  A£]  =  ^  /  [n  •  Ax]  exp 


2nikn 

N 


If  Ax  is  assumed  to  be  a  dimensionless  sample,  then  the  Nyquist  frequency  is  fixed 


at: 


^  1  cycle  ^radians 

<,Nyquist  X  i-  n  i 

2  sample  sample 


Recall  that  the  DFT  assumes  that  the  sample  interval  is  Ax  and  computes  a  periodic 
spectrum  with  period  -F-.  In  the  FFT,  the  spectrum  is  assumed  to  be  sampled  at 
intervals  A<f  =  NlA  ,  which  implies  in  turn  that  the  input  function  is  periodic  with 
period  N  -Ax.  If  A  is  a  power  of  2  (e.g.,  128,  256,  512,  •  •  • ),  there  are  only  A  distinct 
values  of  the  complex  exponential  exp  [—  2m^‘k]  to  calculate.  By  using  this  fact,  the 
number  of  required  operations  may  be  reduced  and  processing  speeded  up.  The  FFT 
of  A  samples  requires  of  the  order  A  •  lo(j2  [A]  operations  vs.  0{N2}  for  the  DFT. 


Since  both  representations  f[n  ■  Ax]  and  F[k  ■  A£]  are  sampled  and  periodic,  the 
inverse  FFT  is  a  finite  summation  and  is  proportional  to: 

JV_1 
2 

/  [n  ■  Ax]  =  C  ^  F  [k  ■  A£]  exp 

k—£L 
K~  2 

The  proportionality  constant  c  is  required  to  ensure  that  T\  {F  [k]}  =  f  [n],  and  may 
be  found  by  substituting  the  formula  for  the  forward  FFT  for  F  [k  ■  A£] : 


+ 


2nink 


A 
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/  [n  ■  Ax] 


4-1  /  4-1 


C\Ax\  E  E  /  [to  •  Ax]  exp 


fc=-4  \ m=— 4 


A  -i  AT 


A  E  /  [to  •  Ax]  exp 


fe=-f  m=-£ 

f-i 


2 


(7  /  [to  •  Ax]  exp 


-=-f 


i.— _  « 
2 


2nik 

'“/v- 

2nik 

'sr 


2  nimk 

A 


(n  —  to) 


(n  —  to) 


C  ^  /  [to  •  Ax]  •  (A  •  5  [n  —  to]) 
m=-f 

(7  •  A  •  /  [n  •  Ax] 

/  [n  •  Ax] 


exp 


2nink 

A 


Thus  (7  =  A  1  and  the  inverse  FFT  may  be  defined  as: 


f-1 

/  [n  •  Ax]  =  -^  F  ik  '  A£]  exP 

fe=-f 

The  proportionality  constant  is  a  scale  factor  that  is  only  significant  when  cascading 
forward  and  inverse  transforms  and  may  be  applied  in  either  direction.  Many  con¬ 
ventions  (including  mine)  include  the  proportionality  constant  in  the  forward  FFT: 


+ 


27t  ink 


A 


F[k-  At] 


f  [n  ■  Ax] 


F 


k 

A  •  Ax 


1 

A 


/  [n  ■  Ax]  exp 


Y  F  [k  ■  A£]  exp 


2nink 

A 


A  •  Ax  •  A£  =  1 


£ Nyq  ~ 


^max 


2nink 

A 
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19.4  DFTs  of  Images 

The  concept  of  a  1-D  Fourier  transform  can  be  easily  extended  to  multidimensional 
continuous  or  discrete  signals.  The  continuous  2-D  transform  is  defined  as: 

P  p+oo 

F‘i  {/  [x,  y]}  =  F  [£,  r/]  =11  f  [ X ,  y]  exp  [-27 n  (£z  +  rjy)\  clxdy 

For  a  uniformly  sampled  discrete  2-D  function  f  [£,  k] ,  the  transform  is  a  summation: 

+oo  +oo 

F 2  {/  [n  ■  Ax,  m  ■  Ay]}  =  E  E  f  [n,  to]  exp  [— 27ri  (£77  •  Aa;  +  rjm  ■  Ay)} 

n=— oo  m=—oo 

=  F  K,1)] 

The  Fourier  transform  of  a  real- valued  2-D  function  is  Hermit-ian  (even  real  part  and 
odd  imaginary  part.  The  Fourier  transform  of  the  image  of  a  2-D  cosine  is  a  pair 
of  delta-function  spikes  at  a  distance  from  the  origin  proportional  to  the  frequency 
of  the  cosine,  as  shown  on  the  next  page.  The  polar  angle  of  the  spikes  relative  to 
the  origin  indicates  the  direction  of  variation  of  the  cosine,  while  the  brightness  of 
the  spikes  is  proportional  to  the  amplitude  of  the  cosine.  Notice  that  the  Fourier 
transform  of  the  sum  of  two  cosine  waves  is  the  sum  of  the  individual  transforms  (i.e., 
the  Fourier  transform  is  linear). 

If  the  sampled  input  image  /  [n,  rn\  has  NxN  pixels,  then  there  are  only  N2  pieces 
of  information  in  the  input.  Thus  the  DFT  F  [£,77]  also  must  contain  at  most  only 
N 2  independent  pieces  of  information.  This  allows  the  DFT  to  be  sampled  without 
loss  of  information: 


F[^v\^F[k-A^e-Av\^F[k,£] 


It  is  easy  to  show  (see  linear  math  course)  that 


A£ 
At  1 


1 

N  ■  Ax 
1 

N-  Ay 


The  2-D  transform  has  the  same  properties  mentioned  before,  including  that  global 
properties  become  local  properties  and  vice  versa.  This  is  the  primary  reason  why  the 
Fourier  transform  is  such  a  powerful  tool  for  image  processing  and  pattern  recognition; 
F  [£,y]  is  uniquely  defined  for  each  /  [x,y\,  and  the  global  properties  of  /  [x,  y\  are 
concentrated  as  local  properties  of  F  [£,77].  Therefore: 


1.  local  modification  of  /  [n,  m]  ==>  global  modification  of  F  [ k ,  £] 

2.  global  modification  of  /  [n,  to]  ==>  local  modification  of  F  [ k ,  £] 

Local  modification  in  the  space  domain  is  what  we  call  “filtering”  and  is  intimately 
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related  the  local  operation  of  convolution  that  we’ve  already  discussed.  In  fact,  it  is 
easy  to  prove  that  the  Fourier  transform  of  a  convolution  is  the  product  of  the  Fourier 
transforms  of  the  component  functions.  This  result  is  called  the  filter  theorem. 

T2  {/  [n,  to]  *h  [n,  to]}  =  T2  {/  [n,  to]}  •  T2  {h  [n,  to]} 

=*►  f*h  =  ^1{F[k,e]-H[k,£}} 

We  have  already  given  the  name  impulse  response  or  point  spread  function  to  h  [x,  y] ; 
the  representation  H  [£,  rj\  is  called  the  transfer  function  of  the  system.The  most 
common  reason  for  computing  Fourier  transforms  of  digital  signals  or  images  is  the 
to  use  this  path  for  convolution.  Examples  are  shown  for  both  lowpass  filtering  and 
differentiation  (highpass  filtering). 


3{mln  wttnfcln  suraJn 


Averaging  of  a  bitonal  “E”  in  the  vertical  direction  with 
h  [n,  to]  =  Sd  [■ n ]  ■  RECT  \f^\;  the  transfer  function  is  “skinny”  in  the  r]-direction, 
thus  attenuating  sinusoidal  components  that  oscillate  in  the  vertical  direction. 
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Differentiation  in  the  x-direction  of  a  pair  of  “E”s  applied  in  the  Fourier  domain. 
The  impulse  response  is  h  [n,  to]  =  6d  [n  +  1,  to]  —  8ci  [n,  to];  the  MTF  is  large  for 
\k\  »  0,  so  it  enhances  high-frequency  sinusoids  that  oscillate  in  the  horizontal 
direction.  The  output  shows  just  the  vertical  edges  of  the  “ E”s . 
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19.5  Other  Global  Operations 


Other  families  of  mask  functions  may  be  used  in  the  general  coordinate  transformation 
equation: 


r»+00 


g  [w]  =  /  f[x]  q[x;u]  dx  (1-D  continuous  functions) 

J  —  OO 

/  [n]  q  [n;  k]  (1-D  discrete  functions) 


g  [u,  v  = 


r>+oo  /»+oo 


/  [x,  y\  q  [x,  y; u,  v]  dx  dy  (2-D  continuous  functions) 


' —oo  J—  OO 


»M  =  EE  /  [n,  m]  q  [n,  m;  k,  £}  (2-D  discrete  functions) 

n  m 

Such  a  transformation  is  invertible  if  /  [x,  y]  can  be  derived  from  g[u,v\  via  an  ex¬ 
pression  of  the  form: 

/+oo 

g[u\  q'[x;u\  du  (1-D  continuous  functions) 


/  M  =  J]  g  [A:]  q  [n;  A:]  (1-D  discrete  functions) 


/  [x,  y]  = 


r»+00  /*+00 


g  [u,  u]  4  [x,  y;  u,  u]  du  dv  (2-D  continuous  functions) 


f  —  oo  J —oo 


f  [n,  m]  =  EE  g  [ k ,  £]  r/  [n,  m;  k ,  £\  (2-D  discrete  functions), 


k  i 


where  uni  is  the  4-D  mask  function  for  the  inverse  transform.  For  the  purposes  of 
this  course  it  is  not  really  essential  to  understand  the  conditions  for  the  transforma¬ 
tion  to  be  invertible,  so  we  will  just  say  that  the  set  of  mask  functions  q[x,y;u,v\ 
must  be  complete,  i.e. ,  any  function  /  [x,  y]  must  be  representable  as  a  sum  (linear 
combination)  of  the  mask  functions. 


19.6  Discrete  Cosine  Transform  (DCT) 


The  discrete  cosine  transform  (DCT)  is  a  relative  of  the  discrete  Fourier  transform 
that  is  often  used  in  digital  image  coding  or  compression.  The  term  is  used  to  denote 
any  of  several  related  algorithms,  but  we  will  consider  only  the  even  symmetric  DCT. 
Consider  a  1-D  sampled  image  /  [n]  with  N  pixels  in  the  interval  0  <  n  <  N  —  1. 
Recall  that  the  DFT  assumed  that  the  jV-pixel  image  /  [n]  is  actually  of  infinite  extent 
but  is  periodic  over  N  samples,  as  below: 
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Arrangement  of  pixel  blocks  in  discrete  Fourier  transform  ( DFT )  and  discrete  cosine 
transform  (DCT).  Pixel  blocks  in  the  DFT  are  strictly  periodic,  thus  possibly 
producing  large  discontinuities  in  gray  value  at  the  edges;  pixel  blocks  in  the  DCT 
are  reversed,  replicating  gray  values  across  the  edges  of  blocks  and  reducing  the 

discontinuities  at  the  edges. 


The  DCT  of  /  [n]  builds  a  new  function  g  [n]  that  is  periodic  over  2N  samples  in  the 
interval  0  <  2 N  —  1: 


9  M  =  /  [n\ ,  0  <  n  <  N  —  1 
g  [n]  =  f[2N  —  1  —  n],N  <  n  <  2 N  —  1 

The  DCT  is  the  ihV-point  DFT  of  g  [n] .  In  this  manner,  the  discontinuities  at 
the  edges  of  the  image  are  eliminated,  and  thus  removing  leakage  from  the  transform 
representation.  The  energy  in  the  transform  will  not  spread  over  so  many  pixels,  and 
thus  the  DCT  representation  will  be  more  compact  than  the  DFT.  In  addition,  the 
2N-point  DCT  will  be  real  because  the  new  function  g  [n]  is  intrinsically  even. 

The  2-D  DCT  is  a  simple  extension  of  the  1-D  version.  Given  an  N  x  N  image 
f[n,m], a  2 N  x  2N  version  g[n,m]  is  created  by  replication  and  reflection.  The  new 
image  will  be  smoothly  periodic  over  2 N  x  2 N  so  that  leakage  will  not  be  present. 


To  reduce  the  impact  of  the  sharp  transitions  that  often  occur  at  the  edges  of 
blocks,  as  well  as  to  obtain  a  transform  that  is  real-valued  for  a  real-valued  input 
image,  the  discrete  cosine  transform  (DCT)  is  used  instead  of  the  DFT.  The  DCT 
has  become  very  important  in  the  image  compression  community,  being  the  basis 
transformation  for  the  JPEG  and  MPEG  compression  standards.  The  DCT  of  an 
M  x  M  block  may  be  viewed  as  the  DFT  of  a  synthetic  2 M  x  2  M  block  that  is 
created  by  replicating  the  original  M  x  M  block  after  folding  about  the  vertical  and 
horizontal  edges: 
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The  original  4x4  block  of  image  data  is  replicated  4  times  to  generate  an  8x8 
block  of  data  via  the  DFT  format  and  an  8x8  DCT  block  by  appropriate  reversals. 
The  transitions  at  the  edges  of  the  4x4  DCT  blocks  do  not  exhibit  the  “sharp  ” 

edges  in  the  Ax  A  DFT  blocks. 


The  edge  discontinuities  of  the  resulting  2 M  x  2 M  block  have  smaller  amplitudes.  The 
symmetries  of  the  Fourier  transform  for  a  real-valued  image  ensure  that  the  original 
M  x  M  block  may  be  reconstructed  from  the  DCT  of  the  2  M  x  2 M  block. 

Consider  the  computation  of  the  DCT  for  a  1-D  M-pixel  block  f[n]  (0  <  n  < 
M  —  1).  The  2M-pixel  synthetic  array  g[n]  is  indexed  over  n  (0  <  n  <  2 M  —  1)  and 
has  the  form: 

(  f[n]  if  0  <  n  <  M  —  1 
9  [n]  =  < 

[  f  [2M  -  1  -  n]  if  M  <  n  <  2 M  -  1 
In  the  case  M  =  8,  the  array  g  [n]  is  defined: 


/  [n]  if  0  <  n  <  7 
f  [15  -  n\  if  8  <  n  <  15 
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The  values  of  g  [n]  for  8  <  n  <  15  is  a  “reflected  replica”  of  /  [n]  : 

g  [8]  =  / 17] 

9  [9]  =  /  [6] 

9  (10]  =  /  [5] 

9  (HI  =  /  (4] 

9  (12]  =  /  [3] 

9  (13]  =  /  [2] 

9  (14]  =  /  [1] 

9  (15]  =  /  [0] 

If  the  “new”  array  g  [n]  is  assumed  to  be  periodic  over  2 M  samples,  its  amplitude  is 
defined  for  all  n,  e.g., 

f  /  [—1  —  n]  if  —  M  <  n  <  —  1  =>  —16  <  n  <  —  1 

9[n]  =  \  r 

[  f[n  +  2M]  if  -2M  <  n  <  -M  -  1  -32  <  n  <  -17 

Note  that  the  16-sample  block  g[n]  is  NOT  symmetric  about  the  origin  of  coordinates 
because  g[— 1]  =  g [0] ;  to  be  symmetric,  g[—£]  would  have  to  equal  g [+£] .  For  example, 
consider  a  1-D  example  where  f[n]  is  an  8-pixel  ramp  as  shown: 
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DFT  of  an  8-pixel  “ramp”  (a)  fi  [n]  =  n  ■  STEP  [n]  for  2 N  =  16/  (b)  [A;]}; 

(c)  ^s{Fi  [A;]},  showing  the  redundancy  of  the  complex  spectrum  and  the 
high-frequency  terms  due  to  the  discontinuous  transition  at  the  edge. 


The  2M-point  representation  of  f[n]  is  the  g[n]  just  defined: 

f  /  [n]  for  0  <  n  <  7 
9  W  =  <  ,  r 

^  /  [2  M  —  1  —  n]  for  8  <  n  <  15 

If  this  function  were  symmetric  (even),  then  circular  translation  of  the  16-point  array 
by  8  pixels  to  generate  g[n  —  8  mod  16]  also  be  an  even  function. 

From  the  graph,  it  is  apparent  that  the  translated  array  is  not  symmetric  about 
the  origin;  rather,  it  has  been  translated  by  —  |  pixel  from  symmetry  in  the  2M-pixel 
array.  Thus  define  a  new  1-D  array  c[n ]  that  is  shifted  to  the  left  by  pixel: 

c  [n]  =  g  n  -  ^ 
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This  result  may  seem  confusing  at  first;  how  can  a  sampled  array  be  translated  by  | 
pixel?  For  the  answer,  consider  the  continuous  Fourier  transform  of  a  sampled  array 
translated  by  \  unit: 

{c  [n]}  =  C  [£]  =  T\  |  fir  x-^  |  =  Ti  jgr  [a]  *  S  x  —  1  | 

=  G  [£]  •  exp  -27 ^  C  [£]  =  G  [£]  •  exp  [-*7t£] 

Thus  the  effect  of  translation  by  |  pixel  in  the  space  domain  is  multiplication  of  the 
Fourier  transform  by  the  specific  linear-phase  factor: 

exp  [— 77r£]  =  cos  [7t£]  —  i  sin  [7t£]  . 

The  2M- point  DFT  of  the  symmetric  discrete  array  (original  array  translated  by  ? 
pixel)  has  the  form: 


F2m  {c  [n]} 


G  [k]  ■  exp  —in 


jfif  [n]  *  5 


C  [k]  =  G  [k]  ■  exp 


—  i  sin 


where  the  continuous  spatial  frequency  £  has  been  replaced  by  the  sampled  frequency 
2^.  This  function  C  [ k }  is  the  DCT  of  f[n\.  Because  the  2M- point  translated  function 
c[n }  is  real  and  even,  so  must  be  the  2M- point  discrete  spectrum  C  [k] ;  therefore  only 
M  samples  of  the  spectrum  are  independent.  This  array  is  the  DCT  of  f[n\. 
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Discrete  cosine  transform  of  ramp  function  is  equivalent  to  the  DFT  of  the 
symmetric  function  shown  in  (a)  /2  [n]  =  /i  [n]  +  f\  [—  (n  +  1)]  for  2 N  =  16;  (b) 
3?  {F2  [k]};  (c)  ^{1^2  [&]},  showing  the  reduction  in  the  relative  amplitude  of  the 

imaginary  part  compared  to  Fi  [k] . 


19.6.1  Steps  in  Forward  DCT 

To  summarize,  the  steps  in  the  computation  of  the  1-D  DCT  of  an  M-point  block 
f[n]  are: 

1.  create  a  2M-point  array  g[n]  from  the  M-point  array  f[n]  : 

g[n]  =  f[n\  :  0  <  n  <  M  —  1 

g[n]  =  f[2M  —  1  —  n]  :  M  <  n  <  2 M  —  1 

2.  compute  the  2M-point  DFT  of  g[n\  =  G  [ k } 

3.  the  M-point  DCT  C  [k]  =  exp  [— fjp]  •  G  [k]  for  0  <  k  <  M  —  1 
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The  entire  process  may  be  cast  into  the  form  of  a  single  equation,  though  the 
algebra  required  to  get  there  is  a  bit  tedious, 

M—i  ,  2  I  1  \ 

C  [ k }  =  ^  2  f  M  cos  (  nk  '  2j\/  )  for  0  <  k  <  M  -  1 

n=o  ^  ' 

19.6.2  Steps  in  Inverse  DCT 

The  inverse  DCT  is  generated  by  applying  the  procedures  in  the  opposite  order: 


1. 


2. 

3. 


create  a  2M-point  array  G  [A:]  from  the  M- point  DCT  C  [A:]: 

ink 


G  [k]  =  exp  -f 
G  [A;]  =  —  exp 


2  M 
ink 


2  M 


C  [A:]  for  0  <  k  <  M  —  1 
•  C  [2 M  -  k]  for  M  +  1  <  k  <  2M  -  1 


compute  the  inverse  2M-point  DFT  of  G  [A;]  — >  g[n] 


f[n]  =  g[n]  for  0  <  n  <  M  —  1 


The  single  expression  for  the  inverse  DCT  is: 


/  [ri. 


1  M_1  / 

—  w\k\  C  cos  [nk  ■ 
k= 0  ^ 


2n  +  l\ 

2  M  J 


where  iv  [k]  =  ^  for  k  =  0  and  iv  [k]  =  1  for  1  <  k  <  M 


for  0  <  n  <  M  —  1 

-  1. 


19.7  Walsh- Hadamard  Transform 

A  transform  which  has  proven  useful  in  image  compression  and  pattern  recognition 
grew  out  of  matrix  theory  and  was  first  described  by  Hadamard  in  1893.  A  mod¬ 
ification  made  by  Walsh  in  1923  is  commonly  used,  and  so  the  transformation  is 
often  named  after  both  men.  The  W-H  transform  resembles  a  Fourier  transform  with 
a  mask  that  has  been  thresholded  so  that  there  are  only  two  values,  ±1.  We  will 
first  consider  the  W-H  transform  of  a  one-dimensional  function  /  [n] .  The  1-D  W-H 
transform  of  a  two-pixel  image  is  derived  from  two  mask  functions  which  may  be 
represented  as  vectors  : 
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g[k] 
q[  o,o] 
q[  i,o] 


q[n;k] 

n= 0 

+1,  q[ 0, 1]  =  +1 

+1,  <z[l,l]  =  -l 


1-D  discrete  functions 


+1 


Thus  the  elements  of  the  2-pixel  W-H  transform  are  the  sum  and  difference  of  the 
pixel  gray  values,  which  is  identical  to  the  2-pixel  Fourier  transform.  The  vectors  mo 
and  mi  are  orthogonal  (i.e.,  perpendicular)  so  that: 


a  •  a  =  (a)0  •  (a)„  +  (a) ,  •  (a,)  t  = 

The  column  vectors  that  define  the  mask  can  be  assembled  into  a  2  x  2  orthogonal 
matrix: 


0  if 

2  if  i  =  j 


h2 

H2 


r  i 

r 

i 

+i 

+i 

+1  +1 

.So. 

% 

= 

+i 

-i 

+1  -1 

H2 


+2  0 
0  +2 


21 


where  I  is  the  2x2  identity  matrix.  Note  that  H2  is  identical  to  its  transpose  H2 , 
which  is  the  criterion  that  defines  an  orthogonal  matrix. 


The  inverse  matrix  H2  1  is  defined  as  the  matrix  that  satisfies: 

H.H,”1  =  I 

which  shows  that  the  inverse  W-H  transform  is  proportional  to  the  transpose  of  H2, 
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and  thus  to  H2  itself. 

+1  +1 
h2  = 

+1  -1 
+1  +1 
+1  -1 

The  four-pixel  W-H  matrix  is  obtained  by  the  direct  product  of  the  two-pixel  W-H 
matrix  with  itself,  i.e., 


+1  +1  +1  +1 
+1  -1  +1  -1 
+1  +1  -1  -1 
+1  -1  -1  +1 

Thus  when  applied  to  a  4-pixel  image  with  values  f(n)  for  0  <  n  <  3,  the  W-H 
transform  is  a  4-pixel  image  with  gray  values: 

w  [0]  =  /  [0]  +  /  [1]  +  /  [2]  +  /  [3] 

w  [1]  —  /  [0]  —  /  [1]  +  /  [2]  —  /  [3] 

w  [2]  —  /  [0]  +  /[!]  —  /  [2]  —  /  [3] 

[3]  =  /  [0]  -  /  [1]  -  /  [2]  +  /  [3] 

Note  that  the  elements  of  w  [k]  are  sums  and  differences  of  the  elements  of  /  [n], 
and  thus  no  multiplication  is  required.  Also  the  elements  of  the  W-H  transform 
representation  will  be  integers  if  so  are  the  input  gray  values.  This  is  the  source 
of  one  useful  characteristic  of  the  W-H  transform:  that  the  transform  representation 
need  not  be  requantized  for  display.  Also  note  that  the  elements  of  the  mask  functions 
of  the  W-H  transform  are  all  purely  real  numbers  so  that  the  W-H  transform  of  a 
real  image  is  real. 

The  inverse  4-pixel  W-H  transform  is  easily  confirmed  to  be: 

/  [0]  =  \(w  [0]  +  zv  [1]  +  iv  [2]  +  zv  [3]) 

/  [!]  =  [0]  -  w  [1]  +  iv  [2]  -  iv  [3]) 

/  t2]  =  [0]  +  w  [1]  -  iv  [2]  -  iv  [3]) 

/  [3]  =  [0]  -  iv  [1]  -  iv  [2]  +  iv  [3]) 


H4  =  H2  x  H2  = 


h2  h2 

h2  -h2 


Recall  that  the  elements  of  the  Fourier  transform  are  ordered  in  terms  of  the  spatial 
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frequency  of  the  mask  functions.  In  similar  fashion,  the  W-H  transform  elements  can 
be  ordered  in  terms  of  the  number  of  sign  changes  of  the  mask  function,  which  is  called 
the  sequency  of  the  term.  The  sequency- ordered  Walsh-Hadamard  transformation 
matrix  is: 


+1  +1  +1  +1 
+1  +1  -1  -1 
+1  -1  -1  +1 
+1  -1  +1  -1 


(no  sign  changes) 
(one  sign  change) 
(two  sign  changes) 
(three  sign  changes) 


Note  that  the  ordered  W-H  matrix  is  also  orthogonal,  so  that  the  ordered  inverse 
W-H  transform  is  proportional  to  the  ordered  forward  transform.  In  similar  fashion, 
the  unordered  matrix  Hgcan  be  obtained  by  computing  the  direct  product  of  Rj  and 
S2: 


H8  =  II,  x  H2  = 


=  H2  X  H4  = 


+H4 

+h4 

+h4 

-h4 

+h2 

+h2 

+h2 

+H 

+h2 

+h2 

h2 

H 

+h2 

-h2 

-h2 

+H. 

+h2 

-h2 

+h2 

— H 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

-1 

+1 

-1 

+1 

-1 

+1 

-1 

+1 

+1 

-1 

-1 

+1 

+1 

-1 

-1 

+1 

-1 

-1 

+1 

+1 

-1 

-1 

+1 

+1 

+1 

+1 

+1 

-1 

-1 

-1 

-1 

+1 

-1 

+1 

-1 

-1 

+1 

-1 

+1 

+1 

+1 

-1 

-1 

-1 

-1 

+1 

+1 

+1 

-1 

-1 

+1 

-1 

+1 

+1 

-1 

The  sequency-ordered  matrix  is: 
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+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

+1 

-1 

-1 

-1 

-1 

+1 

+1 

-1 

-1 

-1 

-1 

+1 

+1 

+1 

+1 

-1 

-1 

+1 

+1 

-1 

-1 

+1 

-1 

-1 

+1 

+1 

-1 

-1 

+1 

+1 

-1 

-1 

+1 

-1 

+1 

+1 

+1 

+1 

-1 

+1 

-1 

-1 

+1 

-1 

+1 

+1 

-1 

+1 

-1 

+1 

-1 

+1 

-1 

Basis  functions  of  the  Walsh- Hadamard  transform  arranged  in  order  of  increasing 
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“sequency,  ”  which  is  analogous  to  the  “frequency  ”  of  the  sinusoidal  basis  functions 
of  the  Fourier  transform.  An  8-sample  1-D  input  function  is  decomposed  into  how 
much  of  that  function  can  be  written  as  each  of  these  basis  functions. 


19.7.1  Interpretation  of  the  W-H  Transform 


Note  that  the  W-H  matrices  may  be  normalized  so  that  the  forward  and  inverse 
transforms  are  completely  identical.  This  is  done  by  dividing  the  elements  of  the 
matrix  by  so  that  H  v  •  H  v  =  I 


The  relative  sizes  of  elements  of  the  W-H  representation  will  indicate  the  busyness 
of  the  input  image;  a  smooth  image  will  have  larger  values  of  the  W-H  transform  for 
small  values  of  k  while  the  W-H  transform  of  a  busy  image  will  be  larger  for  larger 
values  of  k.  The  gray  value  of  an  element  of  the  transform  may  be  interpreted  as  the 
similarity  between  the  input  image  and  the  mask  image.  The  elements  of  the  W-H 
transform  of  realistic  images  whose  gray  levels  are  well-correlated  (i.e.,  smooth)  will 
tend  to  be  large  for  small  values  of  k.  In  other  words,  the  energy  of  the  transformed 
image  will  tend  to  be  compressed  into  the  pixels  indexed  by  small  k:  pixels  with  large 
k  will  have  small  values.  This  property  is  called  energy  compaction ,  and  is  useful  in 
(and  in  fact,  is  the  whole  basis  for)  signal  compression. 


19.7  WALSH-HADAMARD  TRANSFORM 


481 


Examples  of  the  1-D  Walsh- Hadamard  transform  evaluated  over  6f  pixels.  As  the 
input  image  gets  “busier,  ”  the  maximum  sequency  of  the  W-H  transform  increases. 


The  set  of  mask  functions  for  the  2-D  W-H  transform  are  products  of  the  1-D 
functions,  i.e. ,  the  transform  is  separable.  The  8x8  basis  functions  and  the  decom¬ 
position  of  two  gray-scale  images  into  their  8x8  block  Walsh-Hadamard  transforms 
are  shown. 
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j\n,m\  w[k,i\ 

The  64  basis  functions  of  the  8x8  Walsh- Hadamard  transform:  the  6f  basis 
functions  are  shown  on  the  left  and  are  segmented  by  red  dashed  lines.  The  resulting 
8x8  W-H  transforms  are  on  the  right,  and  show  that  only  one  pixel  is  positive  in 

each. 


Two  examples  of  8x8  block  Walsh- Hadamard  transforms,  showing  that  busier  parts 
of  the  image  produce  more  outputs  with  larger  sequencies. 


Chapter  20 

Image  Compression 


20.1  References  for  Image  Compression 

Many  WWW  Sites  on  Compression  -  millions  on  Google. 

Key  Papers  in  the  Development  of  Information  Theory ,  IEEE  Press,  1974 
Raisbeck,  Information  Theory  for  Scientists  and  Engineers 
Ash,  Information  Theory,  Dover 

Frieden,  Probability,  Statistical  Optics,  and  Data  Testing,  Springer- Verlag, 
1981. 

Khincliin,  Mathematical  Foundations  of  Information  Theory,  Dover,  1957. 
Pierce,  J.R.,  An  Introduction  to  Information  Theory,  Signals,  Systems, 
and  Noise,  Dover 

Jain,  A.K.,  “Image  data  compression:  a  review”,  Proc.  IEEE,  69,  349-389,  1981 
Rabbani,  M.  and  P.W.  Jones,  Digital  Image  Compression  Techniques,  SPIE, 
1991 

Abramson,  N,  Information  Theory  and  Coding,  McGraw-Hill,  1963. 

Dainty,  C.  and  R.  Shaw,  Image  Science,  Academic  Press,  1974,  sec  10. 

Grant,  R.  E.,  A.B.  Mahmoodi,  and  W.L.  Nelson,  Image  “Compression  and  Trans¬ 
mission”,  sec  11  in  Imaging  Processes  and  Materials,  Neblette’s  8th  edition,  Van 
Nostrand  Reinhold,  1989 


20.2  Image  Storage,  Transmission,  and  Compres¬ 
sion 

All  of  you  probably  have  an  intuitive  feeling  for  the  fundamental  difficulties  of  storing 
and  transmitting  information;  the  well-known  phrase  “one  a  picture  is  worth  1024 
words ”  is  an  example.  The  subject  has  been  quite  important  in  imaging  as  far  back 
as  the  early  days  of  television  development.  It  has  become  increasingly  important  as 
the  digital  juggernaut  has  taken  hold.  For  comparison,  consider  the  recent  history  of 
“imaging”  of  an  audio  music  signal.  The  goal  of  the  high  fidelity  industry  for  many 
years  has  been  to  record  and  play  back  ever  more  of  the  music  signal  by  increasing  the 
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bandwidth  of  the  recorded  signal  to  reproduce  the  sound  as  realistically  as  possible. 
In  other  words,  the  industry  has  attempted  to  increase  the  amount  of  recorded  “in¬ 
formation”  to  increase  the  “fidelity”  of  the  reproduced  signal.  However,  the  recording 
industry  has  been  revolutionized  in  the  last  few  years  (since  before  the  turn  of  the 
millenium)  by  the  influx  of  digital  recording  technology.  The  goal  now  is  to  discard 
as  much  information  as  possible  (and  thus  “compress”  the  signal)  without  degrading 
the  “message”  conveyed  by  the  sound.  The  goal  is  to  increase  the  quantity  of  music 
recorded  on  a  device  with  fixed  data  capacity,  thus  reducing  the  cost  of  storing  and 
conveying  the  message.  This  is  only  possible  because  digitally  recorded  signals  can 
be  manipulated  in  ways  that  are  not  available  for  analog  recordings. 

The  property  of  messages  that  allows  them  to  be  compressed  is  that  some  portion 
of  the  data  is  not  essential  to  understanding  of  the  message.  For  a  simple  example, 
consider  the  text  message: 


Th  qck  brn  fx  jmpd  ovr  th  Iz  dg 

which  is  perfectly  readable  (at  least  to  most  of  us  for  whom  English  is  their  first 
language)  but  which  requires  only  75%  of  the  letters  in  the  complete  text.  The 
moral  of  this  example  is  that  vowels  are  (usually)  unnecessary  to  the  meaning  of 
text  messages.  In  general,  vowels  supply  redundant  information  in  the  message  and 
often  may  be  discarded  without  loss  of  meaning.  In  an  imaging  sense,  we  might  say 
that  vowels  supply  low-frequency  information  (bulk  properties  of  the  words),  while 
consonants  carry  the  high-frequency  information  (information  about  the  “edges”). 
Redundancies  in  text  messages  or  images  may  be  identified  and  eliminated  using 
different  but  analogous  schemes.  The  data  compression  process  often  is  called  coding, 
because  the  representation  is  altered  and  may  require  a  reference  source  (codebook)  to 
interpret  the  message  in  terms  of  the  original  symbols.  The  name  coding  also  implies  a 
relationship  between  the  processes  of  message  compression  and  message  cryptography, 
which  are  (in  a  sense)  opposites.  Compression  is  the  process  of  removing  nonessential 
information  from  the  message  to  reduce  the  quantity  of  data;  cryptography  is  the 
process  of  adding  nonessential  information  to  different  messages  to  make  them  look 
indistinguishable. 

The  principles  of  image  compression  require  an  understanding  of  the  fundamental 
concepts  of  information  theory,  which  were  laid  down  by  Claude  Shannon  in  the  late 
1940s,  with  later  contributions  by  many  authors,  notably  Norbert  Wiener  and  A.I. 
Khinchin.  The  technologies  (hardware  and  software)  were  very  hot  topics  in  the 
1980s  because  of  limitations  in  storage  space  (as  difficult  as  it  may  be  to  believe  now, 
a  common  size  for  a  PC  hard  drive  was  30  MBytes  in  the  mid-late  1980s).  The  topic 
heated  up  again  in  the  late  1990s  for  internet  applications  because  of  limitations  in 
transmission  bandwidth. 

There  are  (at  least)  two  classes  of  data  redundancy:  objective  and  subjective.  Ob¬ 
jectively  redundant  data  can  be  removed  without  any  loss  of  information.  In  other 
words,  deleted  objectively  redundant  data  may  be  recovered  without  loss.  Subjec¬ 
tively  redundant  data  may  be  removed  from  a  message  or  image  without  any  “visible” 
loss  of  information,  though  the  original  image  cannot  be  recovered  without  error.  By 
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consequence,  we  can  divide  information  compression  into  three  classes:  objectively 
lossless,  subjectively  lossless,  and  subjectively  lossy.  The  message  may  be  recovered 
perfectly  after  encoding  with  an  objectively  lossless  algorithm  (e.g.,  run- length  com¬ 
pression  or  Huffman  coding).  The  message  may  be  recovered  with  no  apparent  loss  if 
encoded  with  a  subjectively  lossless  algorithm  (e.g.,  JPEG  encoding  with  a  good  qual¬ 
ity  factor).  Images  encoded  with  subjectively  lossy  algorithms  have  visible  artifacts, 
but  require  significantly  less  transmission  or  storage  capacity. 

In  this  discussion  of  image  compression  for  storage  and  transmission,  we  will  first 
relate  the  concepts  of  image  representation  by  electronic  signals,  both  analog  and  dig¬ 
ital.  This  will  introduce  the  concepts  of  analog  signal  bandwidth  and  the  limitations 
of  real  systems.  We  will  then  review  Shannon’s  basic  description  of  information  to 
introduce  the  concepts  of  image  entropy  and  channel  capacity,  which  will  be  related 
to  analog  bandwidth.  After  that,  we  will  describe  the  definitive  method  developed 
by  Huffman  to  encode  an  image  to  reduce  the  quantity  of  information  in  a  message 
with  a  specific  number  and  probability  distribution  of  characters  (gray  levels).  These 
concepts  will  be  extended  encode  images  after  an  invertible  image  transformation. 
Proposed  and  adopted  standard  algorithms  for  image  compression  will  be  described, 
and  a  review  of  various  image  storage  technologies  will  follow.  The  section  will  con¬ 
clude  with  a  description  of  the  image  coding  algorithms  in  the  Kodak  PhotoCD™ 
consumer  image  recording  and  playback  system. 

Oppenheim  &  Shafer,  p.416+ 


20.3  Image  Compression 

20.3.1  Histograms,  Information  Theory,  and  Imaging 

A  digital  image  is  a  matrix  of  pixels  with  each  gray  level  described  as  a  binary  integer 
of  m  bits,  for  a  total  of  M  =  2m  levels.  An  image  of  512  x  512  pixels  with  256  levels 
requires  5122  •  8  bits  (256  KBytes)  to  store  the  image.  The  number  of  bits  needed  to 
store  a  full-color  image  is  three  times  as  large,  as  images  in  each  of  the  three  primary 
colors  are  required.  Though  the  cost  of  digital  memory  capacity  continues  to  decrease, 
and  disk  drives  with  capacities  exceeding  300  GByte  (300  •  10243  =  3  •  1011  bytes) 
are  becoming  common,  the  sizes  of  digital  images  continue  to  increase  as  well.  Six- 
megapixel  cameras  (3000  x  2000)  are  now  quite  affordable,  even  for  many  amateurs. 
An  output  image  from  such  a  camera  requires  2000  lines  by  3000  pixels  by  12  bits  by 
3  colors,  or  a  total  of  216  Mbits  (^  25  Mbytes  per  image).  To  help  scholars  read  old 
manuscripts,  we  regularly  produce  color  images  for  display  that  are  about  5000  x  7500 
color  pixels,  or  107  MBytes  (approximately  6  such  images  may  be  stored  on  a  standard 
CD-ROM).  The  requirements  of  this  project  have  consistently  confirmed  Parkinson’s 
Law  of  computing:  that  data  to  be  stored  always  exceeds  available  storage  capacity. 
However,  it  is  usually  possible  to  satisfactorily  store  images  of  5122  8-bit  (256  KBytes) 
with  good  visual  quality  while  using  much  less  data.  In  other  words,  real  images 
“always”  contain  less  information  than  the  maximum  possible;  the  difference  is  due 
to  the  redundancy  of  image  data,  i.e.,  the  gray  value  of  a  pixel  in  a  realistic  image 
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usually  is  not  selected  from  a  uniformly  random  distribution,  but  rather  exhibits 
some  type  of  correlation  with  the  gray  values  of  other  (usually  neighboring)  pixels. 
An  image  or  other  message  with  redundant  data  may  be  compressed  without  loss  of 
information  by  removing  some  (or  even  all)  of  the  redundancy.  Given  the  ratio  of 
nonredundant  data  in  a  message  or  image  to  the  total  data,  the  redundancy  R  in  the 
message  may  be  defined  as: 

1  Nonredundant  Data  [bits] 

Total  Data  [bits] 

R  =  0  if  all  bits  are  required  to  transmit  the  message  without  loss. 

Any  system  that  is  constrained  by  transmission  and/or  storage  capacity  will  bene¬ 
fit  from  reducing  (or  even  removing)  data  redundancy.  For  example,  the  total  resolu¬ 
tion  (spatial,  temporal,  and  color)  of  the  human  visual  system  ultimately  is  limited  by 
the  transmission  capacity  of  the  channel  connecting  the  eye  and  brain,  and  removal 
of  redundancies  is  a  principal  task  of  the  vision  system.  This  also  leads  to  the  concept 
of  subjective  redundancy ,  which  means  that  some  information  may  be  discarded  from 
an  image  without  visible  impact  on  the  quality,  e.g.,  oscillations  in  gray  value  with 
very  short  periods. 

Obviously,  to  understand  the  principles  of  image  compression  by  removing  re¬ 
dundancy,  the  concept  of  information  must  be  considered  first.  “Information”  is  the 
quantity  of  data  in  a  message,  i.e.,  how  much  data  is  required  to  describe  the  message, 
measured  in  bits  (for  storage)  or  in  bits  per  second  or  in  channel  bandwidth  (Hz)  for 
transmission. 

An  image  can  be  considered  to  be  a  message  consisting  of  different  gray  values 
obtained  from  some  set  of  well-defined  possible  values.  The  quantity  of  information  in 
the  message  is  a  function  of  the  probability  of  occurence  of  the  event  described  by  the 
message,  e.g.,  a  message  indicating  that  the  maximum  air  temperature  Tmax  =  70  °F 
in  Rochester  on  New  Years’  Day  contains  more  information  than  a  message  that 
Trnax  [January  1]  =  25  °F.  The  reason  is  probably  obvious;  that  Tmax  =  70  °F  occurs 
rarely  (if  ever)  in  Rochester  in  January  (at  least  until  global  warming  really  kicks 
in!).  In  words,  we  can  define  the  importance  of  the  data  in  the  message  by  noting 
that  “ the  norm  isn’t  “ news  ”  but  a  rare  occurence  is .” 

In  1947,  Shannon  strictly  defined  the  quantity  called  “information”  to  satisfy  two 
basic  requirements  that  make  intuitive  sense.  In  so  doing,  he  began  the  study  of 
information  theory ,  which  is  a  part  of  applied  probability  theory.  Though  Shannon’s 
original  definition  of  information  is  very  intuitive,  it  must  be  generalized  to  provide 
a  more  complete  description  of  the  concept. 

The  concept  of  information  may  be  interpreted  as  the  minimum  number  of  ques¬ 
tions  with  a  binary  answer  set  (the  only  possible  outcomes  are  yes/no  or  1/0)  that 
must  be  answered  before  the  correct  message  may  be  determined.  As  thus  defined, 
information  is  synonymous  with  the  removal  of  uncertainty.  The  uncertainty  about 
the  result  of  experiment  X  increases  with  the  number  of  possible  outcomes  n ;  if  more 
distinct  outcomes  are  possible,  there  is  more  information  in  a  message  stating  which 
outcome  occurs.  The  information  /  about  X  should  be  a  monot-onically  increasing 
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function  of  n,  and  thus  a  monotonically  decreasing  function  of  the  probability  of  a 
specifc  occurence. 

Shannon  defined  information  by  establishing  intuitive  criteria  that  the  concept 
should  obey  and  finding  a  mathematical  expression  that  satisfies  these  requirements. 
As  a  first  example,  consider  the  simplest  case  of  an  experiment  A  which  has  n  equally 
likely  possible  outcomes  (i.e.,  the  probability  of  each  outcome  is  nr1).  The  informa¬ 
tion  about  the  result  of  experiment  X  must  be  related  to  n;  if  n  is  large,  then  the 
information  about  the  specific  outcome  should  be  large  as  well.  If  X  may  be  decom¬ 
posed  into  two  independent  experiments  Y  (with  n  i  equally  likely  possible  outcomes) 
and  Z  (with  n2  equally  likely  outcomes),  then: 

n  =  ni  •  n2  (1) 


and: 


p[X]  =  -  =  p[Y]  ■  p  [Z\ 

TZ 


l  l 

n  i  n2 


1 

(ni  ■  n2) 


(2) 


Thus  the  information  /  about  the  outcome  of  experiment  X  should  be  a  function  / 
that-  satisfies  the  requirement: 


I  [X]  =  f  [n]  =  f  [n i  •  n2\  (3) 

In  addition,  the  information  about  a  composite  of  independent  experiments  should 
be  equivalent  to  the  sum  of  the  information  of  the  component  experiments.  This 
establishes  the  second  criterion  for  /  [A]: 

i  [Ai  =  i  [y]  + 1  [z]  =  /  m  +  /  m  (4) 

Thus  the  information  in  a  message  describing  the  outcome  of  experiment  X  with  n 
possible  outcomes  must  satisfy: 

I  [X]  =>  f  [n]  =  f  [rn  •  n2\  =  f  [ni]  +  f  [n2]  (5) 

One  appropriate  function  that  satisfies  requirement  5  is  the  logarithm: 

Jix]  =>  f  M  =  c  logb  (n)  =  -c  log6  =  -c  log6  (p  [A])  (6) 

where  c  is  some  constant  (which  can  be  assumed  to  be  unity  for  now)  and  b  is  the 
base  of  the  logarithm  such  that  b  >  1.  In  fact,  Khinchin  proved  that  the  logarithm 
is  the  only  continuous  function  that  satisfies  the  required  properties  for  any  finite 
number  n  of  possible  outcomes  (symbols).  This  definition  of  information  satisfies  the 
additivity  requirement  for  information,  i.e., 

I  [A]  =  log6  [n]  =  log6  [m  ■  n2\  =  log6  [ni]  +  log6  [n2]  =  I  [Y]  +  I  [Z])  (7) 

The  units  of  information  are  base-b  digits,  e.g.,  decimal  digits  0  —  9  for  b  =  10  and 
(bi)nary  digi(ts),  or  bits,  for  b  =  2. 
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For  a  simple  example,  consider  the  quantity  of  information  in  a  statement  about 
the  result  of  a  fair  coin  (so  that  pn  =  pr  =  0.5).  The  number  of  equally  likely 
outcomes  is  n  =  2  and  thus  the  information  in  the  message  about  one  “toss”  is: 

I  [X]  =  log2  [2]  =  1  bit  (8) 

If  the  coin  has  two  (indistinguishable!)  heads,  then  there  is  only  one  (“equally  likely”) 
outcome  and  pn  =  1,Pt  =  0.  The  information  content  in  a  statement  about  the 
outcome  of  a  toss  of  a  two-headed  coin  is: 

/  [A]  =  log2  [1]  =  0  bits  (9) 


Shannon’s  original  definition  of  information  in  eq.  6  applies  only  to  equiproba- 
bilistic  outcomes  and  must  be  generalized  if  the  probabilities  of  the  different  outcomes 
are  not  equal.  Consider  the  case  of  an  unfair  coin  with  pH  ^  pT  (both  pH  >  0  and 
Pt  >  0).  Recall  that  pu  =  for  large  N.  This  is  an  intermediate  case  between 
the  certain  outcome  case  (pr  =  0  ==>  0  bits  of  information  per  toss),  and  the  fail- 
coin  ( pH  =  pr  =  0.5  =>  1  bit  of  information  per  toss).  Intuitively,  a  message  that 
specifies  the  outcome  when  flipping  such  a  coin  will  convey  a  bits  of  information 
where  0  <  a  <  1.  The  outcome  of  the  unfair  coin  may  be  considered  to  be  the  result 
of  a  cascade  of  several  experiments  with  equally  likely  outcomes,  but  where  several 
outcomes  are  equivalent.  For  example,  consider  the  case  of  an  unfair  coin  where 
pH  =  0.75  and  pr  =  0.25.  The  outcome  of  a  single  flip  can  be  considered  to  have  four 
equally  likely  outcomes  A  —  D,  i.e.: 


Pa  =  Pb  =  Pc  =  Pd  =  0.25, 


where  outcomes  A,  B.  C  are  heads  and  outcome  Id  is  a  tail.  The  quantity  of  informa¬ 
tion  in  a  single  experiment  with  four  equally  likely  outcomes  is: 


I  [X\  =  log2  [4]  =  -  log2 


=  2  bits  per  toss 


However,  since  the  probabilities  of  the  two  distinguishable  outcomes  are  not  equal, 
there  must  be  excess  information  in  statements  about  the  identical  outcomes  that 
must  be  subtracted  from  the  2  bits.  The  excess  information  in  the  message  about  a 
head  is  log2  [rin]  multiplied  by  the  probability  of  a  head  pu-  Similarly  for  a  tail,  the 
excess  information  is  pT  log  tit-  where  pr  =  —  and  n  =  tih  +  «r- 


Excess  Information  for  head 
Excess  Information  for  tail 


|  lo§2  [3] 
\  l0§2  [1] 


-  •  1.585  =  1.189  bits 
4 

--■0  =  0  bits 
4 
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After  substituting  these  results,  the  total  information  is: 

I  [X]  =  ^log2  [4]  —  ^  log2  [3]  —  ^  log2  [1]^  bits  =  2  —  1.189  =  0.811  bits 

The  general  expression  for  information  content  in  the  case  of  the  unfair  coin  with 
probabilities  pH  and  pr  is: 


I  [X]=  log2  [n]  -  ( pH  log2  [nH]  +  Pt  log2  [: nT ])  =  ( Ph  +  Pt)  log2  [n\  -  pH  log2  [nH]  -  pT  log, 
=  -pH  (log2  [nH]  -  log2  [n])  -  pT  (log2  [nT\  -  log2  [n]) 


=  -pH  log2 


nH 


l  n 


-  Pt  log2 


Ut 


l  n 


=  -pH  log2  \pH\  -  Pt  log2  [pt\  ■ 


(10) 


Since  pi  <  1,  log  pr  <  0  and  /  [A]  >  0;  the  quantity  of  information  is  positive. 
For  M  possible  outcomes  with  probabilities  pt.  this  definition  of  information  can  be 
extended: 


M— 1 

,  subject  to  the  constraint  E  Pl  =  1.  (11) 

i= 0 

Any  “impossible”  outcome  (with  p,  =  0)  is  ignored  to  eliminate  the  problem  of 
calculating  the  logarithm  of  “0” . 

If  the  M  outcomes  are  equally  likely,  so  that  //,  =  M-1,  we  have: 


M-l 

IiX\=~  E  (Pi  l0§2  L Pi 

i= 0 


/  [A'] 


(12) 


as  before.  This  demonstrates  that  the  generalized  definition  of  information  is  consis¬ 
tent  with  the  earlier  one. 

For  the  case  of  the  unfair  coin  with  probabilities  pH  =  0.75  and  pr  =  0.25,  the 
quantity  of  information  in  a  statement  about  the  coin  toss  is: 


I  [X]  =  -0.75  log2  [0.75]  -  0.25  log2  [0.25]  =  0.811  bits  (13) 

Now,  you  may  be  wondering  what  a  fractional  number  of  bits  of  information  actually 
“means?”  It  can  be  considered  as  the  average  uncertainty  that  is  removed  by  the 
message  of  the  outcome  of  the  experiment  X.  If  we  toss  the  fair  coin  100  times,  the 
resulting  string  of  outputs  ( e.g HTTHHHTHT  ■  ■  ■)  may  be  transmitted  with  100 
bits.  The  outputs  of  100  tosses  of  the  two-headed  coin  requires  0  bits  to  transmit 
while  the  outcome  of  100  tosses  of  the  unfair  coin  may  be  specified  by  100-0.811  =  82 
bits,  if  the  proper  coding  scheme  is  used.  A  method  for  reducing  the  number  of  bits 
of  such  a  message  will  be  described  shortly. 
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It  is  very  important  to  note  that  Shannon’s  definition  of  information  assumes  that 
the  outcomes  of  a  particular  experiment  are  random  selections  from  some  particular 
probability  distribution.  If  we  flip  a  “256-sided”  fair  coin  a  large  number  of  times  (say 
N  =  512  x  512  =  262, 144  “flips”),  we  can  generate  an  “image”  with  N  pixels,  each 
with  256  “outcomes”  (gray  levels)  that  will  be  approximately  equally  populated.  If 
the  gray  level  of  some  specific  pixel  is  4,  then  the  gray  levels  of  its  neighbors  are  (by 
assumption)  as  likely  to  be  199  or  255  as  3  or  5.  Such  a  system  for  generating  data 
may  be  called  a  discrete  memoryless  source  or  DMS  because  an  individual  output 
from  the  system  (pixel  gray  value)  is  independent  of  previous  outputs.  However,  we 
know  that  adjacent  pixels  in  pictorial  images  of  real  scenes  usually  belong  to  the  same 
object  and  tend  to  have  similar  gray  levels.  Thus  the  gray  values  of  adjacent  pictures 
are  usually  correlated.  The  effect  of  correlated  pixels  on  the  information  content  of 
imagery  will  be  considered  shortly. 

The  measure  of  information  (eq.ll)  often  is  called  the  entropy  because  its  form 
is  identical  to  the  quantity  that  appears  frequently  in  statistical  thermodynamics. 
The  entropy  of  a  body  of  a  gas  depends  on  its  volume,  mass,  and  (most  of  all)  its 
temperature.  The  energy  of  the  gas  also  depends  on  these  three  properties.  One  of 
the  steps  in  the  Carnot  cycle  of  the  ideal  heat  engine  allows  a  gas  to  expand  within  a 
thermally  insulated  confined  space  by  pushing  against  a  (slowly  moving)  piston.  The 
insulation  ensures  that  no  heat  flows  into  or  out  of  the  gas.  The  expansion  requires 
that  the  gas  lose  some  of  its  thermal  energy  through  cooling;  this  is  the  principle  of  the 
refrigerator  or  air  conditioner.  The  lost  thermal  energy  performs  work  on  the  piston. 
Such  a  process  in  which  no  heat  flows  into  or  out  of  the  system  is  called  adiabatic 
and  is  reversible ;  the  piston  may  do  work  on  the  gas,  thus  raising  the  temperature. 
The  entropy  of  the  system  is  unchanged  by  a  reversible  process. 

In  real  life,  physical  interactions  are  not  reversible.  For  example,  no  gas  expansion 
is  truly  adiabatic  because  the  process  cannot  be  thermally  insulated  perfectly  from 
its  surroundings.  Real  gas  expansions  increase  the  entropy  of  the  system.  Consider  a 
gas  that  is  confined  in  a  box  that  is  divided  in  equal  parts  by  a  removable  partition. 
A  gas  is  confined  in  one  half  and  a  vacuum  exists  in  the  other.  Until  the  partition  is 
removed,  the  gas  is  capable  of  performing  useful  work  through  expansion  by  pushing 
on  a  piston.  However,  if  the  partition  is  removed  suddenly,  the  gas  expands  over  time 
to  fill  the  entire  container  without  performing  any  work.  No  thermal  energy  is  lost. 
However,  restoration  of  the  system  to  its  original  state  would  require  work  to  be  done 
on  the  gas  to  push  it  back  into  the  original  volume;  the  system  is  not  reversible.  We 
say  that  the  expansion  of  the  gas  increased  the  entropy  of  the  system  because  the 
process  was  not  reversible.  In  this  way,  the  entropy  is  a  measure  of  the  capability  of 
the  system  to  perform  useful  work  by  changing  thermal  energy  into  mechanical  energy. 
Equivalently,  it  is  a  measure  of  the  disorder  of  the  system.  The  original  system  was 
more  ordered  before  removing  the  partition  because  we  had  more  knowledge  about  the 
location  of  the  molecules  of  the  gas,  thus  the  entropy  was  lower  before  the  expansion. 
The  entropy  is  a  statistical  description,  we  do  not  have  a  complete  description  of  the 
location  and  velocity  of  each  molecule  either  before  or  after  removal  of  the  partition. 

The  concept  of  entropy  as  a  measure  of  disorder  is  the  more  applicable  to  the 
current  problem.  The  energy  of  a  set  of  coin  flips  (or  the  arrangement  of  gray  levels 
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in  an  image)  is  determined  by  the  statistics  of  the  outcome;  by  the  histogram  of 
results.  The  entropy  of  the  set  of  coin  flips  or  image  gray  levels  is  determined  by  the 
uncertainty  in  their  arrangement  given  knowledge  of  the  statistics  (the  histogram). 
The  less  likely  the  result,  the  more  information  in  a  message  that  the  result  has 
occurred  or  will  occur. 


20.3.2  Information  Content  of  Images 

An  image  can  be  considered  as  the  output  of  an  ensemble  of  pixels  (experiments) 
whose  results  are  derived  from  a  discrete  set  of  possibilities  (the  histogram).  It  is 
the  histogram  of  the  image  with  M  gray  levels  which  determines  the  quantity  of 
information  as  defined  by  Shannon  via  the  entropy  equation: 


M— 1 

IiX]  =  Pi  log2  \Pi] 

i= 0 


(14) 


For  most  realistic  monochrome  images  digitized  to  8  bits  per  pixel,  the  entropy  is 
in  the  range  of  4-6  bits  per  pixel.  To  further  investigate  the  meaning  of  Shannon’s 
definition  of  information,  consider  that  an  image  with  N  pixels  and  M  gray  levels  can 
be  produced  from  N  experiments  with  each  having  M  possible  outcomes  for  a  total 
of  MN  possible  distinct  images.  In  the  binary  image  case,  the  gray  values  may  be 
produced  by  N  coin  flips.  For  a  1  x  1  image,  N  =  1  and  the  number  of  possible  images 
is  21  =  2  (0  and  1),  and  there  are  also  two  possible  histograms  (1,0)  and  (0,1).  There 
are  22  =  4  possible  two-pixel  binary  images  (11,  10,  01,  and  00)  and  three  possible 
histograms  (2,0),  (1,1),  and  (0,2).  Note  that  there  are  two  images  with  histogram 
(1,1).  For  N  =  3  pixels,  there  are  23  =  8  distinct  images  (111,  110,  101,  Oil,  100, 
010,  001,  000)  and  four  possible  histograms  (3,0),  (2,1),  (1,2),  and  (0,3).  There  is  one 
image  each  with  histogram  (3,0)  or  (0,3),  and  three  images  each  with  histogram  (2,1) 
and  (1,2).  The  progression  of  the  number  of  possible  binary  images  divided  into  a 
number  of  distinct  histograms  specified  by  the  set  of  binomial  coefficients: 


N  [N0,  Ah] 


N\ 

N0\  ■  Ah! 


where  N  is  the  number  of  pixels  in  the  image,  and  Ay,  Ah  are  the  number  of  pixels 
with  level  0  and  1,  respectively.  Note  that  the  constraint  Ay  I  Ah  =  N  must  be 
satisfied.  The  array  of  the  binomial  coefficients  defines  Pascal’s  triangle  as  shown 
below,  where  the  row  number  represents  the  number  of  pixels  in  the  image,  the  sum 
of  the  numbers  in  the  row  is  the  number  of  possible  images,  the  number  of  groups  in 
each  row  is  the  number  of  possible  histograms,  and  the  number  in  each  group  is  the 
number  of  images  that  has  a  particular  histogram: 
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21  =  2  images  1  binary  pixel 

22  =  4  images  2  binary  pixels 

23  =  8  images  3  binary  pixels 

24  =  16  images  4  binary  pixels 

25  =  32  with  5  binary  pixels 

26  =  64  with  6  binary  pixels 


Number  of  binary  images  with  different  histograms  for  N=l,2,3,f,5,6 


For  example,  there  are  1  +  4  +  6  +  4  +  1  =  16  images  composed  of  four  binary 
pixels  with  five  different  histograms.  One  image  has  histogram  (4,0),  one  with  (0,4), 
four  each  with  histograms  (3,1)  and  (1,3),  and  six  images  with  histogram  (2,2).  Note 
that-  the  number  of  images  for  a  specific  histogram  increases  as  the  population  of 
gray  values  becomes  more  equally  distributed,  i.e.,  toward  the  center  of  the  row  in 
Pascal’s  triangle.  As  we  will  soon  demonstrate,  the  information  content  of  an  image 
with  a  specified  histogram  is  largest  when  the  histogram  populations  are  equal,  i.e., 
for  the  histogram  with  the  largest  number  of  possible  images  and  thus  the  greatest 
uncertainty  of  image  content. 

For  images  with  M  gray  levels  (M  >  2),  calculation  of  the  number  of  possible 
images  and  histograms  is  somewhat  more  complicated.  The  multilevel  analog  of  the 
binomial  coefficient  used  in  Pascal’s  triangle  is  the  multinomial  coefficient: 


N[N0,NUN2,N3,---  7Nm] 


N\ 

Nq!  Ah!  N2\ 


where  N  is  the  number  of  pixels  and  N0,  Ni,  N2,  ■  ■  ■  ,  NM  are  the  populations  of  each 

gray  level  subject  to  the  constraint  N0  +  Ni  +  N2  H - h  NM  =  N.  For  N  =  8  pixels 

and  M  =  4  possible  gray  levels  (0-3;  2  bits  of  quantization),  the  number  of  possible 
images  is  48  =  65, 536  =  64 K.  The  number  of  distinct  8-pixel  images  with  histogram 
[N0,NuN2,N3\  is: 


N  [N0,  Ni,  N2,  N3] 


8! 

N0!  Nil  N2\  N3!  ’ 


where  N0  +  Ni  +  N2  +  N3  =  8.  For  example,  if  the  histogram  is  known  to  be  (4, 4, 0,0), 
the  number  of  possible  8-pixel  2-bit  images  is: 


N[ 4, 4,  0,0] 


8! 

4!  4!  0!  0! 


70. 


Other  specific  histograms  yield  N  [3,  3, 1,1]  =  1120  possible  images,  N  [3, 2,  2,1]  = 
1680  images,  and  N  [2,  2,  2,  2]  =  2520  images.  Again,  note  that  the  number  of  distinct 
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images  increases  as  the  histogram  becomes  “flatter” .  Also  recall  that  Shannon’s  def¬ 
inition  of  information  is  determined  by  the  image  histogram.  Because  there  are  more 
distinct  images  with  a  flat  histogram  than  with  a  clustered  histogram,  the  “amount” 
of  information  should  be  larger  for  an  image  with  a  flat  histogram.  An  image  with  a 
flat  histogram  indicates  that  there  is  maximum  uncertainty  and  maximum  informa¬ 
tion  content,  and  thus  requires  the  most  bits  of  data  for  complete  specification. 

Now  consider  the  number  of  possible  images  with  specific  histograms  for  more 
“reasonable”  image  formats.  A  common  size  for  a  monochrome  digital  image  is  N  x 
N  =  5122  pixels  and  M  =  256  levels,  so  that  the  number  of  distinguishable  images 
is: 


MNxN  =  256(5122)  =  2(8'29'2®)  =  22,097,152  =  (iologio[2])2’' 

—  (IQ0-30103)2’097’152  =  ;[Q0-30103'2>097,152  ~  10631,3°6.66 


097,152 


The  size  of  this  number  may  be  gauged  against  the  estimates  that  there  are  1078 
atoms  and  1088  cubic  millimeters  in  the  universe.  Of  these  MANY  images,  only  a 
single  one  has  the  histogram  with  all  pixels  clustered  at  gray  level  0.  There  are  5122 
possible  images  with  one  pixel  at  gray  level  1  and  the  rest  (262,143)  at  gray  level  zero. 
The  number  of  images  with  two  pixels  at  gray  level  1  and  262,142  at  0  is: 


N  [262142, 2,  0,  •  •  •  ,  0] 


(262144)! 

(262142!)  (2!)  (0!)  -  -  -  (0!) 


^  3.44  •  1010 


If  we  continue  this  progression  and  add  more  populated  gray  levels  to  “flatten”  the 
histogram,  the  number  of  distinguishable  images  increases  very  rapidly.  Such  an 
image  with  a  perfectly  flat  histogram  has  5122/256  =  1024  pixels  in  each  level;  the 
number  of  such  images  is: 

f5122fl 

N  [10240, 1024!,  •  •  •  ,  1024255]  =  ^  ;256  “  10630’821. 


The  approximation  was  derived  via  Stirling’s  formula  for  N\  where  N  is  large: 


lim  \N\]  =  V2nN  ■  Nn  ■  e~N 

TV— >oo 


We  should  check  this  formula;  substitute  N  =  10  to  find  that 


10!  =  3.5987  x  106 


whereas  the  actual  result  is  10!  =  3.6288  x  106,  so  the  error  is  only  about  a  factor  of 
10~3 

If  254  levels  are  populated  with  1028  pixels  each  and  one  level  has  1032  pixels  (a 
“slightly  clustered”  histogram),  the  number  of  images  is: 


A[10280,1028i,10282, 


j  -L  w 


494 


CHAPTER  20  IMAGE  COMPRESSION 


which  is  smaller  than  the  number  of  images  with  a  flat  histogram  (by  a  factor  of 
10444,  which  is  still  pretty  large!).  Note  that  the  number  of  possible  images  again  is 
maximized  when  the  histogram  is  flat;  the  uncertainty,  the  information  content,  and 
thus  the  number  of  bits  to  specify  the  image  are  maximized  when  the  histogram  is 
flat. 


20.3.3  Maximizing  Image  Information 


We  just  showed  by  example  that  an  image  with  a  flat  histogram  has  the  maximum 
possible  information  content.  The  rigorous  proof  of  this  assertion  is  an  optimization 
problem. 

Given  that  the  probability  distribution  of  gray  levels  for  f[n,m\  is  proportional 
to  the  image  histogram  H  [/] . 

H  if] 


Pi  = 


N 


where  N  is  the  number  of  pixels  in  the  image.  Since  the  maximum  possible  population 
of  a  gray  level  is  the  total  number  of  pixels  (i.e.,  N),  0  <  pt  <  1  as  required.  The 
problem  is  to  find  the  set  of  gray- level  probabilities  {p% }  for  M  levels  that  maximizes 
information  content: 


M- 1 


iif]  =  -^2  pi  ios( 


b  [Fi\  1 


i=0 


subject  to  the  constraint: 


M- 1 


^  =  L 


i= 0 


To  maximize  /,  we  set  its  total  derivative  equal  to  zero: 


ell  =  Tp—dp0 

opo 


dl  dl 

—dpi  +  —  dp2 
dpi  op2 


dl 


dpM-i  =  0. 


dpM- 1 

subject  to  the  constraint  that  the  probabilities  sum  to  a  constant: 

d(|>)=0 

This  optimization  problem  is  easily  solved  by  Lagrangian  multipliers.  If  we  maximize  a 
linear  combination  of  /  [/]  and  pt .  we  will  automatically  maximize  /  [/].  Construct 

a  function  L  [f]  to  be  maximized: 


M—l 


M- 1 


M—l 


M—l 


L  [/]  = 1  [/]  - A  pi  =  -  pi  logb  ^  xpi  =  -  pi  tlog&  ^  +  • 


i= 0 


i= 0 


4=0 


4=0 


where  A,  the  Lagrangian  multiplier,  is  a  constant  to  be  determined.  To  maximize  L, 
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we  set  its  total  derivative  equal  to  zero: 


M-i  g 

dL=J2fy  dPi  =  ° 


i= 0 


Because  the  differential  probabilities  p,  are  arbitrary,  a  necessary  and  sufficient  con- 
to  vanish  i 

dL  dp 


dition  for  dL  to  vanish  is  to  have  the  component  derivatives  .  vanish  individually: 


dPl  ap/Co&W  +  AJ+p.-^aoaW  +  A) 

=  (log6  [Pi\  +  A)  +  Pi  ■  —  =  1  +  logb  [Pi]  +  A  =  0,  for  all  i 

Pi 

log6  \pi\  =  -  (1  +  A) 

==>  pi  =  e~(1+AA  where  A  is  constant. 


Note  that  the  probability  pi  for  the  occurence  of  the  ith  level  is  constant,  thus  the 
probabilities  of  all  outcomes  must  be  equal  when  L  [/]  (and  hence  /[/]))  is  maximized. 
The  constraint  allows  us  to  calculate  the  numerical  value  of  the  Lagrangian  multiplier: 

M—  1  M—  1  , 

Pi  =  1  =  6“(1+A)  =  M  b~(1+X)  =  M  pi  =  M  p  ==P  Ip 

i= 0  i= 0  A 

In  words,  the  information  content  I  [f]  of  an  image  f[n,m]  is  maximized  when  all 
gray  levels  are  equally  populated,  i.e.,  when  the  histogram  is  flat.  Slow  variations  in 
gray  level  from  pixel  to  pixel  are  most  visible  in  an  image  with  a  flat  histogram.  This 
is  (of  course)  the  motivation  for  “histogram  equalization”  in  digital  image  processing. 
We  should  note  that  histograms  of  quantized  images  cannot  be  “equalized”  in  a  strict 
sense  unless  the  gray  values  of  some  pixels  are  changed  to  fill  underpopulated  bins. 
This  can  be  performed  only  as  an  approximation  by  considering  the  gray  values  of 
neighboring  pixels. 


20.3.4  Information  Content  (Entropy)  of  Natural  Images 

The  redundancy  R  in  a  message  was  already  defined  as  a  dimensionless  parameter 
based  on  the  ratio  of  nonredundant.  data  to  the  total  data: 

1  Nonredundant  Data 
Total  Data 

Because  the  human  visual  system  is  constrained  by  transmission  or  storage  capacity, 
it  is  useful  to  reduce/remove  data  redundancy  beforehand.  For  example,  the  total 
resolution  (spatial,  temporal,  and  color)  ultimately  is  limited  by  the  transmission  ca¬ 
pacity  of  the  channel  connecting  the  eye  and  brain,  and  removal  of  such  redundancies 
is  a  principal  task  of  the  vision  system.  Examples  of  HVS  mechanisms  that  remove 
redundancy  include  lateral  inhibition  of  the  neural  net,  opponent-color  processing, 
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and  the  nonlinear  response  of  the  vision  receptors. 


In  Predictability  and  redundancy  of  natural  images  (JOS A  A,  Vol.  4,  2395, 
1987)  D.  Kersten  estimated  the  redundancy  of  some  natural  images  by  presenting  to 
observers  eight  1282  4-bit  images  (16,384  pixels,  each  with  16  gray  levels  stretched 
over  the  available  dynamic  range).  The  4-bit  quantization  ensured  that  the  images 
had  significant  redundancy  and  precluded  the  presence  of  significant  additive  noise  in 
the  quantized  values  (i.e.,  additive  noise  would  be  typically  quantized  to  zero).  The 
images  ranged  in  content  from  “simple”  to  “busy”  and  included  natural  scenes.  The 
pixels  subtended  a  rectangle  of  size  10x7  arcminutes  to  the  observer.  A  predetermined 
fraction  of  the  pixels  were  deleted  and  replaced  with  one  of  the  16  gray  levels  chosen 
at  random.  For  example, the  origina  limage  were  in  fact  a  random  image,  then  it 
would  be  impossible  to  determine  which  pixels  were  altered;  such  an  image  would 
have  no  redundancy.  If  the  original  image  were  pictorial  (e.g.,  a  human  face),  then 
most  of  the  replaced  pixels  will  be  obvious;  such  an  image  has  significant  redundancy. 
Observers  guessed  at  the  gray  value  of  the  replaced  pixel  until  the  correct  value  was 
assigned.  The  larger  the  number  of  guesses,  the  more  uncertain  the  gray  level  of  the 
pixel.  The  histogram  of  the  number  of  guesses  was  used  to  determine  the  entropy 
/  [/] .  The  number  of  guesses  at  each  replaced  pixel  for  the  random  image  should  be 
large  (mean  value  of  8),  and  small  for  the  pictorial  image.  Therefore,  the  redundancy 

r  [/]  =  i  -  LIS.  =  i  _  1ML 

Jmax  4  bits 

where  /  [/]  is  the  entropy  of  image  /  and  Jmax  is  the  number  of  quantization  bits. 
Kersten’s  results  indicated  that  the  eight  images  have  redundancies  in  the  interval: 

0.46  (“busy”  picture)  <  R  <  0.74  (picture  of  face) 

The  corresponding  content  of  nonredundant  information  is: 

2.16  bits  per  pixel  >/[/]>  1.04  bits  per  pixel 
Based  on  the  previous  discussion,  the  number  of  1282  4-bit  images  is: 

Number  =  24'1282  =  2  65’536  =  1019’728 
But  the  number  of  natural  images  would  lie  in  the  range: 

22.16-1282  rj  2^5,389  1010’653  >  Number  >  21'04'1282  =  217’0397  =  105’129 

which  is  STILL  many  times  larger  than  the  estimated  number  of  atoms  in  the  universe! 
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20.4  Lossless  Compression 

20.4.1  Run-Length  Encoding 

Run-length  encoding  is  a  very  simple  method  for  compression  of  sequential  data  (and 
particularly  for  binary  data)  that  takes  advantage  of  the  fact  that  consecutive  single 
“tokens”  (gray  values)  are  often  identical  in  many  data  sets.  Run  length  encoding 
inserts  a  special  token  each  time  a  chain  of  more  than  two  equal  input  tokens  are 
found.  This  special  input  advises  the  decoder  to  insert  the  following  token  n  times 
into  his  output  stream.  For  example,  consider  the  sequence  of  3-bit  data: 


7 

7 

7 

2 

6 

6 

6 

6 

6 

2 

2 

5 

5 

5 

5 

5 

5 

5 

5 

5 

The  encoded  RLE  image  would  be: 

7321652259 

where  the  first  digit  in  the  encoded  image  is  the  gray  value  of  the  first  pixel,  the 
second  digit  is  the  number  of  occurences,  etc.  The  sequence  is  reduced  from  20  3-bit 
numbers  (60  bits)  to  10  digits  (though  the  long  run  of  “5”  at  the  end  requires  a  4-bit 
number  to  encode).  This  means  that  this  sequence  might  be  encoded  into  5  3-bit 
numbers  and  5  4-bit  numbers,  for  a  total  of  35  bits.  If  the  system  were  limited  to 
three  bits,  the  sequence  of  9  examples  of  level  “5”  would  be  split-  into  one  sequence 
of  7  and  one  of  2: 


732165225752 

for  a  total  of  10  3-bit  numbers,  or  30  bits. 

The  compression  in  RLE  occurs  when  the  image  exhibits  strings  of  the  same  value. 
If  the  image  is  “noisy” ,  then  the  image  after  RLE  coding  will  likely  require  more  bits 
than  the  uncompressed  image. 

The  common  “bitmap”  image  format  (.BMP)  uses  run- length  encoding. 

20.4.2  Huffman  Code 

We  have  already  seen  that  the  information  content  of  an  image  is  a  function  of  the 
image  histogram;  an  image  with  a  clustered  histogram  contains  less  information  than 
an  image  with  a  flat  histogram  because  the  former  contains  statistically  redundant 
information.  This  implies  the  existence  of  methods  to  encode  the  information  with 
fewer  bits  while  still  allowing  perfect  recovery  of  the  image.  The  concept  of  such  a 
code  is  quite  simple:  the  most  common  gray  levels  are  assigned  to  a  code  word  that 
requires  few  bits  to  store/ transmit,  while  levels  which  occur  infrequently  are  encoded 
with  many  bits;  the  average  number  of  bits  per  pixel  is  reduced  by  attempting  to 
equalize  the  number  of  bits  per  gray  level.  A  procedure  for  obtaining  a  code  from 
the  histogram  was  specified  by  David  A.  Huffman  in  1951  while  he  was  a  25-year-old 
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graduate  student  at  MIT.  Huffman  developed  the  coding  scheme  as  part  of  a  final 
assignment  in  a  course  on  information  theory  given  by  Prof.  Robert  A.  Fano.  After 
working  unsuccessfully  on  the  problem  for  months,  the  solution  came  to  Huffman  as 
he  was  tossing  his  notebooks  into  the  trash  (related  in  Scientific  American ,  September 
1991,  pp.  54-58).  He  presented  his  method  to  Fano,  who  “Is  that  all  there  is  to  it?” 
Huffman’s  coding  scheme  is  now  ubiquitous.  (This  story  must  be  a  metaphor  for 
something.)  (Huffman,  DA.,  Prod.  IRE  40,  1098-1101,  1952). 

The  Huffman  code  assumes  that  the  gray  values  are  selected  at  random  from  some 
known  probability  distribution.  In  other  words,  it  assumes  that  the  gray  values  of 
adjacent  pixels  are  unrelated  (which  is,  of  course,  not  true  for  meaningful  pictorial 
images,  though  perhaps  true  for  images  transformed  to  a  different  coordinate  system). 
A  source  of  uncorrelated  random  numbers  is  called  “memoryless”. 

The  entropy  of  the  Huffman-coded  image  always  is  within  1  bit  per  pixel  of  the 
information  content  defined  by  Shannon.  The  Huffman  code  removes  bits  from  the 
message  by  discarding  objective  redundancy.  It  is  lossless  and  unambiguous.  This 
first,  quality  means  that  the  original  image  may  be  reconstructed  without  error  from 
knowledge  of  the  coded  image  and  the  code  book.  The  quality  of  no  ambiguity  ensures 
that  only  one  set  of  gray  levels  may  be  decoded  from  an  ungarbled  string  of  binary 
digits  encoded  by  the  Huffman  procedure. 

The  Huffman  code  is  perhaps  best  described  by  example.  The  pixels  in  an  image 
with  8  gray  levels  are  usually  defined  specified  by  a  binary  code  that  requires  3  bits  per 
pixel.  For  attempted  clarity,  the  gray  levels  will  be  indicated  by  alphabetic  characters: 


Decimal.  =  Binary  — >  Alphabetic 
0.  =  0002  ->•  A 

1.  =  0012  ->•  B 

2.  =  0102  ->•  C 

3.  =  0112  ->•  D 

4.  =  1002  ->•  E 

5.  =  1012  ->•  F 

6.  =  1102  ->  G 

7.  =  1112  ->•  H 


Consider  a  3-bit  100-pixel  image  with  levels  distributed  as  in  the  following  his¬ 
togram: 

H  [A,  B,  C,  D,  E,  F,  G,  H]  =  [0,  9, 12, 40,  30, 5, 4,  0] 

The  probability  of  each  gray  level  therefore  is: 

p  [A]  =  0,  p  [B]  =  .09,  p  [C]  =  12,  p  [D]  =  .40,  p  [E]  =  30,  p  [F]  =  .05,  p  [G]  =  .04,  p  [H]  =  0 
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The  information  content  in  this  image  obtained  from  Shannon’s  entropy  formula: 

I  =  -0.4  log2  [0.4]  -  0.3  log2  [0.3] - 

=  0.529  +  0.521  +  ---  +  0 
=  2.131  bits/pixel 

which  is  considerably  less  than  the  3-bit  quantization.  To  derive  the  Huffman  code, 
arrange  the  occupied  levels  in  descending  order  of  probability.  One  bit  is  used  to 
distinguish  the  least  likely  pair  of  levels  using  either  the  convention  that  the  binary 
digit  1  is  assigned  to  the  more  probable  level  and  0  to  the  less  probable,  or  vice 
versa.  This  is  the  least  significant  bit  in  the  Huffman  code,  as  it  distinguishes  the 
most  rarely  occurring  gray  levels.  The  probabilities  of  the  two  rarest  levels  are  then 
summed  (giving  a  total  probability  of  0.09  in  this  case)  and  the  list  of  gray  values 
is  rearranged  in  descending  order.  In  this  example,  the  sum  of  the  probabilities 
of  the  rarest  levels  F  and  G  is  equal  to  the  probability  of  the  next  level  B,  and  so 
reordering  is  not  required.  The  new  pair  of  least-likely  levels  in  the  rearranged  list  are 
distinguished  by  again  assigning  a  binary  digit  using  the  same  convention  as  before. 
The  last  two  probabilities  are  summed,  the  list  is  reordered,  and  the  process  continues 
until  bits  are  assigned  to  the  last  pair  of  probabilities.  The  last  binary  digit  derived 
using  this  procedure  is  the  most  significant  bit.  The  schematic  of  the  entire  sequence 
is  shown  in  the  figure: 


Leue 1  Probabil ity 


0 


Codes  tL 

G  =10100  B  =1011  E  =11 

F  =10101  C  =100  D  =0 


Calculation  of  Huffman  code  for  3-bit  image  with  probabilities  as  listed  in  the  text. 
The  gray  levels  are  arranged  in  descending  order  of  probability.  One  bit  is  used  to 
distinguish  the  rarest  levels,  the  probabilities  are  summed,  the  list  is  rearranged  in 
descending  order,  and  the  process  continues. 


The  Huffman  code  for  a  specific  gray  level  is  the  sequence  of  binary  digits  assigned 
from  right  to  left  for  that  level,  i.e. ,  from  most  significant  to  least  significant  bit.  The 
code  for  level  B  is  obtained  by  following  the  sequence  from  the  right:  the  path  for 
level  B  is  in  the  upper  branch  on  the  far  right  (code  =1),  and  the  lower  branch  at 
two  more  junctions  (codes  =  0  and  0)  so  that  the  code  for  level  B  is  10112.  The  codes 
for  the  other  levels  are  found  similarly  and  the  resulting  code  book  is  listed  above. 
The  number  of  bits  to  encode  all  pixels  having  each  gray  level  may  be  calculated,  and 
their  sum  is  the  average  number  of  bits  per  pixel  required  to  encode  the  entire  image: 
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Gray  Level  / 

P\f] 

Code 

Bits  in  Code 

(Bits)  for  Level 

D 

0.40 

o2 

1 

0.40  x  1  =  0.4 

E 

0.30 

112 

2 

0.30  x  2  =  0.6 

C 

0.12 

1002 

3 

0.12  x  3  =  0.36 

B 

0.09 

10112 

4 

0.09  x  4  =  0.36 

F 

0.05 

101012 

5 

0.05  x  5  =  0.25 

G 

0.04 

101002 

5 

0.05  x  4  =  0.20 

(Bits)  for  Image  =  2.17  bits/pixel 

To  demonstrate  the  unambiguous  nature  of  the  code,  consider  the  sequence  of  14 
pixels  “CDDEEBEDFGCDEC” ,  which  encodes  to  a  sequence  of  35  bits  (2.5  bits  per 

pixel) 


CDDEEBEDFGCDEC  =  10000111110111101010110100100011100 

The  message  is  decoded  by  examining  the  order  of  bits.  Since  the  first  bit  is  not  a 
“0”,  it  cannot  be  the  most  common  gray  value  “D”.  Since  the  secon  bit  is  not  a  “1”, 
the  first  pixel  cannot  be  an  “E” .  The  third  bit  is  “0” ,  and  therefore  the  first  pixel 

must  be  “C” 


10000111110111101010110100100011100 


Now  repeat:  the  fourth  bit  is  “0”,  and  only  “D”  has  this  first  bit: 


100  0  0111110111101010110100100011100 


Ditto,  “D”  is  the  third  character: 


100  0  0  111110111101010110100100011100 


The  fourth  pixel  begins  with  “11”,  and  therefore  is  “E”: 


100  0  0  11  1110111101010110100100011100 


Ditto  for  the  fifth” 


100  0  0  11  11  10111101010110100100011100 


The  sequence  continues  until  all  bits  have  been  decoded: 
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100 

0 

0 

hi 

11 

1011 

11 

0 

10101 

10100 

100 

0 

11 

100 

c 

D 

D 

E 

E 

B 

E 

D 

F 

G 

C 

D 

E 

C 

This  example  also  may  be  used  to  illustrate  the  pitfall  of  Huffman  coding.  If  a  bit  is 
garbled  (“flipped”),  then  it  is  likely  that  many  (if  not  all)  of  the  following  characters 
in  the  message  will  not  be  decodable,  because  the  redundancy  in  the  message  has 
been  removed.  Consider  the  example  where  the  sixth  bit  is  flipped  from  a  “1”  to  a 
“0”: 

10000011110111101010110100100011100 
In  this  case,  the  decoded  message  would  be: 


100 

0 

0 

0 

11 

11 

0 

11 

11 

0 

10101 

10100 

100 

0 

11 

100 

[0 

11 

D  E 

E  D  E 

E 

11 

E 

M 

E 

E 

M 

C 

1 _ 

insi 

_ 1 - L 

tead  of: 

[ 

D  D 

E  E  B 

E[ 

n 

Gl 

3 

Di 

E 

c] 

The  “flipped”  bit  resulted  in  the  incorrect  decoding  of  the  sample  “B”  as  two  samples 
“DE”. 


This  code  exceeds  the  theoretical  limit  of  Shannon  information  by  only  0.04 
bits/pixel,  and  reduces  the  number  of  bits  required  to  store  the  image  to  (2.17/3) 
bits,  or  72%  of  that  required  for  the  uncompressed  original.  The  efficiency  of  the 
Huffman  code  is  defined  as  the  ratio  of  the  average  number  of  bits  per  pixel  for  the 
code  to  the  theoretical  limit  determined  by  Shannon’s  definition.  In  this  case,  the 
compression  efficiency  is: 


2.131 

2.17 


0.982 


For  real  images,  lossless  compression  via  a  Huffman  code  on  a  pixel-by-pixel  basis 
can  achieve  compression  efficiencies  in  the  range  of  0.67  <  p  <  0.25,  a  modest  im¬ 
provement.  Note  that  the  code  book  must  be  available  to  the  receiver  to  allow  recon¬ 
struction  of  the  image.  This  extra  data  is  known  as  the  overhead  of  the  compression 
scheme  and  has  not  been  included  in  the  calculations  of  compression  efficiency.  The 
length  of  the  Huffman  codeword  of  a  gray  level  whose  histogram  probability  is  p  is 
—  log2  [p],  e.g.,  if  the  probability  is  0.125,  the  ideal  codeword  length  is  3  bits.  If  log2  [p] 
is  significantly  different  from  an  integer  (e.g.,  p  =  0.09  ==>  —  log2  [p]  =  3.47  bits), 
the  coding  efficiency  will  suffer. 

A  significant  shortcoming  of  a  Huffman  code  is  that  it  cannot  adapt  to  locally 
varying  image  statistics,  or  equivalently,  a  particular  Huffman  code  will  not  be  opti¬ 
mum  for  a  variety  of  images.  In  fact,  inappropriate  application  of  a  Huffman  code 
can  actually  lead  to  an  increase  in  the  storage  requirements  over  the  original,  e.g.,  if 
the  image  contains  many  levels  which  were  rare  in  the  original  source  image. 
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20.4.3  Information  in  Correlated  Images  —  Markov  Model 

We’ve  just  seen  how  the  Huffman  code  takes  advantage  of  clustering  of  the  histogram 
to  reduce  the  number  of  bits  required  to  store/transmit  an  image  whose  gray  levels 
are  not  equally  populated,  and  assuming  that  the  gray  values  of  pixels  are  obtained 
from  a  discrete  memoryless  source,  i.e. ,  the  gray  value  at  a  pixel  is  a  number  selected 
at  random  from  the  probability  distribution  (the  image  histogram).  Obviously,  this 
last  assumption  is  false  for  most  (if  not  all)  realistic  images  consisting  of  objects  whose 
component  pixels  have  similar  properties  (e.g.,  gray  level,  color,  or  texture).  These 
correlations  provide  a  context  for  a  pixel  and  add  additional  redundancy,  which  may 
be  exploited  to  achieve  additional  compression.  Redundancy  may  be  considered  as 
creating  clusters  in  particular  histograms  generated  from  the  original  images.  Ex¬ 
amples  of  redundancy  include  similarities  of  pixel  gray  level  in  local  neighborhoods 
(interpixel  redundancy)  in  a  single  image,  that  may  be  exploited  by  constructing 
codes  for  groups  of  pixels  of  a  single  image  ( vector  coding )  or  by  coding  linear  com¬ 
binations  of  blocks  of  pixels,  as  in  the  JPEG  standard.  Similarities  in  color  ( spectral 
redundancy)  generate  clusters  in  the  multispectral  histogram,  thus  reducing  data- 
transmission/st.orage  requirements  in  color  images.  This  clustering  is  used  by  the 
NTSC  video  transmission  standard  and  the  Kodak  PhotoCD™.  Correlations  of  cor¬ 
responding  pixels  across  image  frames  in  a  motion  picture  ( temporal  redundancy) 
allows  significant  additional  compression  and  is  exploited  in  the  MPEG  standards. 
In  addition,  images  meant  for  human  viewing  may  be  compressed  by  removing  image 
content  that  is  not  visible  to  the  eye;  these  subjective  redundancies  or  superfluous 
information  are  present  the  spectral,  spatial,  and  temporal  dimensions,  and  are  uti¬ 
lized  for  additional  compression  in  all  of  the  consumer  compression  standards,  such 
as  JPEG,  MPEG,  and  PhotoCD™. 

The  statistical  properties  of  a  correlated  image  are  more  complicated  than  those 
of  a  discrete  memoryless  (random)  source,  and  one  of  the  difficulties  in  developing 
efficient  standard  algorithms  for  image  compression  is  the  creation  of  a  mathematical 
model  of  these  various  redundancies.  The  simplest  model  of  interpixel  redundancies 
is  the  Markov  source,  where  the  probability  that  the  pixel  located  at  coordinate  n 
has  gray  level  /  is  a  function  of  the  gray  level  at  some  number  of  neighboring  pixels. 
The  number  of  these  neighboring  pixels  is  the  order  of  the  Markov  source;  the  higher 
the  order,  the  more  correlated  are  the  gray  values  in  a  neighborhood  and  the  more 
predictable  the  image  content  from  previous  levels.  Thus  we  should  be  able  to  define 
a  measure  of  information  for  a  Markov  source  that  is  less  than  the  normal  entropy. 
For  a  correlated  source  model,  such  as  a  Markov  source  of  order  1,  we  can  define  the 
first-order,  or  conditional,  entropy  of  the  image  with  M  gray  levels  as: 


M— 1  M-l 

1  ifk\fk-i]  =  pp  iog2  [tcAip] 

i= 0  j= 0 

where  fk  and  fk-i&ve  the  gray  levels  of  the  kth  and  ( k  —  1 ) st  pixels  and  I  [fk\fk-i] 
is  the  information  in  the  kth  pixel  fk  given  that  the  (k  —  l)st  pixel  has  gray  value 
fk- 1-  The  conditional  probability  of  pt)j  given  pj  is  denoted  \pi,j\pj\-  It  may  be  clearer 
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to  think  of  the  conditional  entropy  as  just  the  information  content  of  the  set  of  2-D 
gray- level  “vectors”  f  =  [/i,/2] 


7  [f]  =  -  [-]  loS2  [p  [f]] 

f 

where  the  sum  is  over  all  the  2-D  pixel  vectors  f.  Note  that  the  total  number  of 
gray-level  states  of  the  2-D  histogram  is  the  square  of  the  number  of  gray  levels;  the 
vector  histogram  of  an  image  with  M  levels  has  M 2  bins. 

The  first-order  entropy  is  the  average  information  content  of  the  experiment  at 
the  kth  pixel  given  that  the  k  —  1st  pixel  gray  value  is  known.  If  the  gray  values 
are  correlated,  the  first-order  entropy  per  pixel  may  be  much  less  than  the  Shannon 
entropy  (pixels  considered  independently);  in  other  words,  the  vector  histogram  of  the 
image  is  clustered.  Note  that  the  Shannon  entropy  may  be  considered  as  the  zeroth- 
order,  or  scalar  entropy  of  the  image.  If  the  pixel  gray  levels  are  independent  (random 
DMS  source),  then  the  first-order  entropy  will  be  log2  [M2]  =  2dog2  [M]  bits  per  pixel 
because  every  pair  of  gray  levels  will  be  equally  likely;  the  2-D  vector  histogram  is 
thus  flat,  i.e.,  all  2-D  vectors  (whose  components  are  gray  levels  of  adjacent  pixels) 
will  be  equally  likely.  The  image  from  a  first-order  Markov  source  may  have  a  flat 
histogram,  but  the  2-D  histogram  of  neighboring  pixels  may  be  clustered;  2-D  vector 
coding  of  such  an  image  will  exhibit  significant  data  compression. 


20.4.4  “Vector”  Coding  (Compression) 


Images  generated  by  a  Markov  source  often  may  be  compressed  by  examining  the 
histogram  of  groups  of  pixels  to  look  for  correlations.  As  a  simple  example  of  the 
difference  between  a  DMS  and  a  Markov  source,  consider  first  a  bilevel  DMS  with  a 
priori  probabilities  p  [0]  =  0.2  and  p  [1]  =  0.8.  In  words,  the  output  of  the  source  is 
more  likely  a  1  (white)  than  a  zero.  A  FAX  output  is  a  bilevel  image  that  might  have 
this  probability  function.  The  Shannon  entropy  of  this  source  is: 

I  =  -0.2  log2  [0.2]  -  0.8  log2  [0.8] 

=  0.7219  bits  per  pixel 


Now  consider  the  probabilities  of  the  four  cases  for  two  pixels  generated  by  the  DMS. 
Since  the  pixels  are  independent,  the  probability  that  the  current  pixel  is  a  “1”  given 
that  the  previous  pixel  is  a  “1”  is  just  the  product  of  the  probabilities  -  there  is  no 
influence  of  the  previous  pixel  on  the  choice  of  the  current  pixel: 


f[n 


1 

1 

:  p 

1 

1 

p[  1]  -p[  1]  =0.8-0.8  =  0.64 
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Similarly,  the  probability  that  the  current  pixel  is  a  “0”  given  that  the  previous  pixel 
is  a  “0”  is: 


/  M  = 
/  M  = 
/  M  = 


0 

0 

1 

0 

0 

1 

p 

p 

p 


0 

0 

1 

0 

0 

1 

=  p  [o]  •  p  [o] 
=  P  [1]  •  P  [0] 
=  P  [0]  •  P  [1] 


0.2  •  0.2  =  0.04 
0.8  -0.2  =  0.16 
0.8  -0.2  =  0.16 


Note  that  the  sum  of  these  conditional  probabilities  is  still  unity,  as  required.  The 
entropy  of  the  conditional  probabilities  is: 

I  =  -0.64  log2  [0.64]  -  0.04  log2  [0.44]  -  0.16  log2  [0.16]  -  0.16  log2  [0.16] 

=  1.4438  bits  per  element 


Since  there  are  two  pixels  per  element,  the  entropy  of  the  pixels  taken  two  at  a  time 
is  just: 


I  =  1.4438  bits  per  pair 
2  pixels  per  pair 


0.7219  bits  per  pixel 


which  is  identical  to  the  scalar  entropy  of  the  DMS.  In  words,  the  information  in  the 
pixels  from  the  DMS  taken  two  at  a  time  is  identical  to  that  from  the  DMS  taken  one 
at  a  time;  there  is  no  additional  compression  because  there  is  no  interpixel  correlation. 


In  a  realistic  imaging  situation,  we  might  expect  that  black  and  white  pixels 
will  be  grouped  together  in  the  message.  Thus  the  probabilities  p 


1 


1 


and 


P 


0 


0 


would  be  larger  in  a  real  image  that  for  a  discrete  memoryless  source.  To 

and  p 


ensure  that  the  sum  of  the  probabilities  is  unity,  p 


0 


1 


1 


0 


would 


be  expected  to  decrease,  and  also  to  be  the  same,  since  we  would  expect  the  same 
number  of  transitions  from  black  to  white  as  from  white  to  black.  A  possible  table  of 
probabilities  from  a  first-order  Markov  source  with  p  [0]  =  0.2  and  p  [1]  =  0.8  would 
be: 


P 

P 

P 


0.02 


The  resulting  entropy  of  the  pixels  taken  two  at  a  time  is: 


I  =  -0.80  log2  [0.80]  -  0.16  log2  [0.16]  -  0.02  log2  [0.02]  -  0.02  log2  [0.02] 
=  0.9063  bits  per  element  =  0.4632  bits  per  pixel 


There  is  a  significant  reduction  in  the  entropy  of  0.7291  bits  per  pixel  for  this  exam¬ 
ple  of  a  first-order  Markov  source.  The  additional  compression  is  due  to  interpixel 
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correlation. 

The  concept  may  be  extended  to  higher-order  entropies.  If  the  source  is  second- 
order  Markov,  the  gray  value  of  a  pixel  is  influenced  by  those  of  two  adjacent  pixels.  In 
this  case,  the  set  of  3-D  vectors  defined  by  triplets  of  gray  values  would  be  clustered, 
and  3-D  vector  coding  will  further  reduce  the  number  of  bits. 

As  an  example  of  the  effect  of  spatial  correlations  on  image  compression,  consider 
first  the  the  4x4  2-bit  image  shown: 


0 

1 

0 

0 

0 

1 

2 

2 

0 

1 

2 

3 

1 

2 

2 

3 

H\f\ 
I  If] 


[5, 4, 5,  2],  p\f]  = 


5  4  5  2 

16’  16’  16’  16 


(  5  , 

'  5  ' 

1 , 

5  , 

'  5  ' 

1 , 

'1 

Uio& 

16 

+  ^  log2 

_4_ 

+  Th  log2 
16 

16 

+  8l0g 

8  2 

_8_ 

=  1.924  bits/pixel 


Because  the  histogram  of  this  image  is  approximately  ’’flat”,  little  (if  any)  benefit 
would  be  obtained  by  using  a  Huffman  code.  But  note  that  adjacent  pairs  of  pixels 
exhibit  some  correlation;  if  a  pixel  is  dark  (0  or  1),  it  is  more  likely  that  its  neighbor 
to  the  right  is  dark  than  bright.  This  is  demonstrated  by  constructing  the  two- 
dimensional  histogram  of  the  pairs  of  left-right  pixels  of  f[n,m] : 


Gray  Values 

0 

LEFT 

1 

SIDE 

2 

3 

0 

0 

0 

2 

0 

RIGHT  1 

0 

1 

1 

0 

SIDE  2 

3 

0 

0 

0 

3 

1 

0 

0 

0 

Note  that  each  pixel  appears  in  only  one  pair;  there  is  no  double-counting.  The 
sum  of  the  elements  of  the  2-D  histogram  is  8,  and  thus  there  are  16  pixels  (8  pairs)  in 
f[n,m\.  Also  note  that  there  is  a  total  of  16  possible  elements  in  the  2-D  histogram, 
four  times  as  many  as  in  the  1-D  histogram;  the  2-D  histogram  of  a  2-bit.  image  is 
a  4-bit.  image.  Even  so,  if  the  2-D  histogram  of  pixel  pairs  is  clustered,  it  may  be 
possible  to  encode  the  representation  in  fewer  bits.  The  entropy  of  this  specific  2-D 
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histogram  is: 


3  , 

'3' 

(  1  , 

T' 

\  2  , 

'2' 

8  l0& 

_8_ 

+  3-  f --  log2 

_8_ 

)  “  8  '0fe 

_8_ 

=  2.156- 


bits 


element 


Because  there  are  two  pixels  per  element,  the  information  content  per  pixel  in  the 
2-D  histogram  is: 


2.156  bits 
2  pixel 


1.078 


bits 

pixel 


<  1.924 


bits 

pixel 


The  reduction  in  entropy  of  the  image  is  due  to  the  correlations  between  neighboring 
gray  values.  As  before,  the  elements  of  the  2-D  histogram  may  be  coded  by  a  Huffman 
procedure.  This  approach  is  called  vector  coding  or  block  coding ,  because  each  pixel 
pair  describes  a  two-dimensional  vector  of  gray  levels.  In  analogous  fashion,  the 
original  Huffman  approach  to  encode  individual  pixels  is  sometimes  called  scalar 
coding. 

Of  course,  if  is  feasible  to  encode  larger  blocks,  which  is  equivalent  to  constructing 
gray-level  vectors  with  more  dimensions.  For  example,  consider  the  4x4  3-bit  image: 


0 

1 

0 

1 

2 

CO 

2 

3 

0 

1 

0 

1 

2 

3 

2 

3 

If  we  encode  2x2  blocks  of  four  pixels,  the  four  resulting  four-dimensional  vectors 
are  identical.  Because  the  4-D  histogram  has  only  one  populated  bin,  the  information 
content  of  the  vector  image  is  0  bits  per  block,  or  0  bits  per  pixel.  Of  course,  we  must 
also  transmit  the  gray-level  formation  of  the  block  (i.e.,  the  codebook  that  specifies 
the  gray  values  assigned  to  each  code).  This  represents  the  necessary  overhead  of  the 
code.  In  this  case,  the  number  of  required  bits  is  determined  by  the  codebook  alone. 

Note  that  the  effectiveness  of  a  vector  code  depends  strongly  on  gray-level  corre¬ 
lations  of  the  image.  A  vector  code  that  is  effective  for  one  image  (e.g.,  a  human  face) 
may  be  ridiculously  ineffective  for  a  different  type  of  image  (e.g.,  an  aerial  image  of 
Kuwait).  Conversely,  a  vector  code  that  is  appropriate  for  a  particular  type  of  image 
will  likely  be  so  for  images  of  objects  in  the  same  class.  Also  note  that  if  the  vector 
histogram  is  flat  (i.e.,  approximately  equal  populations  in  each  bin),  then  there  will 
be  no  reduction  in  storage  requirements  obtained  by  using  the  Huffman  code. 

Example  —  Entropy  of  the  English  Alphabet 

Sources:  N.  Abramson,  Information  Theory  and  Coding,  McGraw-Hill,  1963. 
J.R.  Pierce,  An  Introduction  to  Information  Theory 

Shannon,  “Prediction  and  Entropy  of  Printed  English,  ”  Bell  Syst  Tech  Jour. 
30,  pp. 50-64,  1951. 

The  simplest  messages  in  the  English  language  may  be  written  with  26  letters  (one 
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case)  and  the  space.  If  these  27  characters  were  equally  probable,  the  information 
content  in  a  message  would  be: 


27 


'  =  £■ 


*= i 


(  1  \  , 

'  1 ' 

'  1  ' 

(27) log2 

_27_ 

=  -  log2 

_27_ 

=  4.755  bits  per  character 


+  l°g2  [27] 


A  sample  of  typical  text  with  equal  probabilities  is: 

XFOML  RXKHRJFFJUJ  ZLPWCFWKCYJFFJ  EYVKCQSGHYD  QPAAMKB 

Of  course,  the  probabilities  of  English  characters  are  not  equal.  The  histogram  of  the 
characters  may  be  determined  from  a  statistical  study  of  words.  Abramson  gives  the 
following  table  of  character  occurences: 


Symbol 

Probability 

Symbol 

Probability 

A 

0.0642 

N 

0.0574 

B 

0.0127 

O 

0.0632 

C 

0.0218 

P 

0.0152 

D 

0.0317 

Q 

0.0008 

E 

0.1031 

R 

0.0484 

F 

0.0208 

S 

0.0514 

G 

0.0152 

T 

0.0796 

H 

0.0467 

U 

0.0228 

I 

0.0575 

V 

0.0083 

J 

0.0008 

w 

0.0175 

K 

0.0049 

X 

0.0013 

L 

0.0321 

Y 

0.0164 

M 

0.0198 

Z 

0.0005 

(space) 

0.1859 
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Histogram  of  Letter  Occurrences 
in  English  Text 


Symbol 


Graph  of  histogram  of  letter  occurences  in  the  English  language  if  the  case  of  the 

character  is  not  considered. 


Using  this  “1-D”  histogram  of  probabilities  (i.e. ,  assuming  that  the  characters 
occur  “independently”  but  are  taken  from  this  histogram  of  probabilities),  the  en¬ 
tropy  of  an  English-language  message  would  be  somewhat  less  than  for  equally  likely 
occurences: 

27 

/  =  ^  (— Pi  log2  \pf\)  =  4.080  bits  per  character 
1=1 

A  sample  of  text  that  might  be  selected  from  this  probability  distribution  is: 

OCRO  HLI  RGWR  NMIELWIS  EU  LL  NBNESEBYA  TH  EEI  ALHENHTTPA  L 

Just  as  obviously,  we  know  that  English  characters  do  not  occur  “independently”; 
the  probability  of  occurence  of  a  character  depends  on  the  particular  characters  in 
the  preceding  sequence.  The  next  most  realistic  description  is  based  on  the  frequency 
of  occurence  of  pairs  (“digrams”)  of  characters,  and  thus  on  the  2-D  histogram  of 
272  =  729  pairs.  We  know  that  some  combinations  (e.g.,  “QX”,  “ZJ”)  occur  very 
rarely  if  at  all,  and  so  these  “bins”  of  the  2-D  histogram  will  be  unoccupied.  Therefore, 
the  2-D  histogram  of  digrams  is  “clustered”  and  thus  we  expect  some  characters 
to  be  “predictable”.  Therefore  the  information  content  of  those  characters  will  be 
decreased.  The  histogram  of  character  frequencies  may  be  computed  from  statistical 
digram  frequency  tables  that  were  constructed  by  cryptographers.  Shannon  computed 
the  resulting  entropy  to  be: 


27  27 

/( 27  characters  as  pairs)  =  XX  (—  p[i\j]  1°§2  [p[*b1D  —  3-56  bits  per  character 

i=i  j= i 
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Typical  text  selected  from  the  2-D  histogram  is: 

ON  IE  ANTSOUTINYS  ARE  T  INCTORE  ST  BE  S  DEAMY 
ACHIN  D  ILONASIVE  TUCOOWE  AT 

Note  that  this  text  appears  to  be  (perhaps)  slightly  more  “intelligible”  than  that 
selected  from  the  independent  histogram  -  some  characters  almost  form  words!  (e.g., 
inctore,  deamy) 

To  continue  the  idea,  consdier  the  computation  of  entropy  based  on  “trigrams”, 
or  triplets  of  characters.  Shannon  computed  the  entropy: 

27  27  27 

/( 27  characters  as  triplets)  =  EYE  (—  p  [i\j\k]  log2  \p  [i|j|A;]])  =  3.3  bits  per  character 

i= 1  j  1  k= 1 

A  typical  sample  of  such  text  is: 

IN  NO  1ST  LAT  WHEY  CRATICT  FROURE  BIRS  GROCID  PONDENOME 
OF  DEMONSTURES  OF  THE  REPTAGIN  IS  REGOACTIONA  OF  CRE 

Note  that  as  more  characters  are  included  in  a  group  for  statistical  computation  of  the 
entropy,  the  more  the  typical  text  resembles  English.  Shannon  continued  the  process 
and  estimated  the  upper  and  lower  bounds  of  the  entropy  per  character  for  groups 
of  English  letters  up  to  15  characters  and  also  for  100  characters.  The  entropy  per 
character  approaches  a  limit  for  more  than  9  or  so  letters  in  the  interval  1  <  I  <  2 
bits  per  character,  but  drops  to  the  range  0.6  <  I  <  1.3  bits  for  groups  of  100. 

This  means  that  characters  in  long  strings  of  English  characters  are  correlated;  the 
100-dimensional  histogram  of  groups  of  100  characters  exhibits  clustering. 

Of  course,  messages  could  be  constructed  from  tables  of  word  frequencies  instead 
of  character  frequencies.  A  message  based  on  first-order  word  frequencies  (one  word 
at  a  time)  is: 

REPRESENTING  AND  SPEEDILY  IS  AN  GOOD  APTOR  COME  CAN  DIFFERENT 
NATURAL  HERE  HE  THE  A  IN  CAME  THE  TO  OF  TO  EXPERT  GRAY  COME 
TO  FURNISHES  THE  LINE  MESSAGE  HAD  BE  THESE 

and  a  message  using  second-order  frequencies  is: 


“THE  HEAD  AND  IN  FRONTAL  ATTACK  ON  AN  ENGLISH  WRITER  THAT 
THE  CHARACTER  OF  THIS  POINT  IS  THEREFORE  ANOTHER  METHOD  FOR 
THE  LETTERS  THAT  THE  TIME  OF  WHO  EVER  TOLD  THE  PROBLEM  FOR 
AN  UNEXPECTED” 
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20.4.5  Other  Flavors  of  Huffman  Coding 
Modified  Huffman  Codes 

In  many  cases  of  both  message  and  image  compression,  the  codebook  is  quite  large 
but  includes  many  symbols  with  very  small  probabilities.  It  often  is  useful  to  combine 
the  very  unlikely  codes  into  a  single  symbol  ELSE,  which  is  encoded  by  the  Huffman 
process  and  is  transmitted  along  with  the  actual  binary  code  for  the  character  or  gray 
level.  A  variation  of  these  scheme  is  used  for  digital  facsimile. 

Adaptive  Huffman  Codes 

Another  modification  of  Huffman  coding  allows  the  process  to  adapt  to  varying  sta¬ 
tistics,  but  the  algorithms  are  complicated  to  implement. 

20.4.6  Arithmetic  Coding 

IBM  Jour.  Res.  &  Dev.,  32(6),  November  1988 
Gonzalez  and  Woods,  pp.  348-9 
Rabbani  and  Jones,  §3.5 

Like  the  Huffman  code,  the  result  of  an  arithmetic  code  is  a  sequence  of  variable- 
length  code  symbols,  but  the  symbols  are  not  assigned  to  pixels  (or  blocks  thereof) 
which  were  quantized  to  fixed  numbers  of  bits.  In  other  words,  the  arithmetic  code 
for  a  group  of  gray  levels  is  not  restricted  to  an  integer  number  of  bits.  For  example, 
consider  encoding  a  bitonal  image  (such  as  a  FAX),  where  one  level  (probably  white) 
is  much  more  likely  than  the  other.  Because  there  are  only  two  levels,  it  is  not  possible 
to  improve  on  one  bit  per  pixel,  even  using  a  Huffman  code.  Instead,  the  arithmetic 
code  is  a  tree  code ,  where  one  codeword  is  assigned  to  each  string  of  input  pixels  of 
some  fixed  length.  The  generated  code  word  for  this  string  is  a  representation  of  an 
interval  on  the  real  line  within  the  range  [0,1)  whose  length  is  proportional  to  the 
likelihood  of  the  string.  If  the  string  to  be  coded  is  lengthened,  then  the  corresponding 
subinterval  becomes  shorter  and  requires  more  bits  to  distinguish  it  from  its  neighbors. 
A  slight  change  in  the  sequence  can  result  in  a  significantly  different  codeword.  The 
algorithm  for  generating  an  arithmetic  code  is  most  easily  described  by  example. 
Consider  a  sequence  of  1-bit  characters  where  the  frequencies  of  0  and  1  are  3/4  and 
1/4,  respectively.  One  example  of  such  a  sequence  is  0010.  A  unit  interval  is  divided 
into  subintervals  based  on  the  order  of  occurences  of  characters  (quantized  pixels)  in 
the  sequence.  If  the  next  character  in  the  message  is  0,  the  bottom  3/4  of  the  interval 
will  be  selected;  if  1,  the  upper  1/4  will  be  selected.  At  the  start  the  full  interval  is: 

0  <  x0  <  1 

The  first  character  in  the  message  is  “0”  with  known  frequency  of  3/4;  the  unit 
interval  is  shrunk  to  the  lower  3/4: 

3  3 

0  <  X!  <  -,  |Zi|  =  - 
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The  second  character  also  is  “0”,  and  the  interval  is  subdivided  to  the  lower  3/4  again: 


0  <x2<—, 

16 


The  third  character  is  “1”  with  frequency  | 
of  x2: 


x2 


9 

16 


,  so  the  next  subinterval  is  the  upper 


i 

4 


9  1  9  27  9  .  .  9 

16  _  4  '  16  “  64  “  ;l3  <  16’  ^  “  64 


(0a011011)2  <  (0a1)2  <  (0a10000111)2 

where  the  symbol  “a”  is  the  “binary  point”  (analogous  to  the  “decimal  point”;  it 
separates  the  bits  for  positive  and  negative  powers  of  2).  The  last  character  in  the 
message  is  “0”,  so  the  interval  is  subdivided  to  the  lower  3/4: 


27 

64 


0.421875  <  x4  < 


27 

64 


135 

256 


0.5273475,  \x4 


27 

256 


Any  point  in  the  subinterval  x4  can  be  used  as  the  code  for  the  string,  because  that 
a: 4  can  only  be  obtained  by  that  specific  sequence  of  m  input  characters. 

Because  the  probabilities  are  multiples  of  powers  of  two  in  this  specific  example, 
the  length  of  the  subinterval  is  easily  represented  as  a  binary  fraction.  Though  this 
is  not  always  true,  it  is  useful  to  continue  the  analysis  to  show  how  the  code  may  be 
represented.  Recall  that  fractional  numbers  may  be  represented  in  binary  notation 
by  breaking  up  into  inverse  powers  of  2,  so  the  the  bit  closest  to  the  binary  point 
represents  2”1,  the  second  bit  represents  2~2,  •  •  • .  The  endpoints  of  the  interval  x4  in 
this  example  are: 


27 

64 


0  1 
-  +  - 

J  135 
256 


1 

8 

1 

2 


0  110  0  ,  , 

- 1 - t - ( - 1 - =  (OaOHOHOO),  ->•  (OaOIIOII), 

16  32  64  128  256  V  v  ;2 

0  0  0  0  1  1  1 

-  H - t - ( - 1 - f - 1 - =  (0a1000011)9 

4  8  16  32  64  128  256  V  '2 

(0a011011)2  <  x4  <  (0a10000111)2 


The  arithmetic  code  for  the  sequence  “0010”  is  produced  by  selecting  the  representa¬ 
tion  of  ANY  binary  fractional  number  in  the  interval  x4,  though  the  shortest  binary 
fractional  number  will  give  the  shortest  code.  In  this  example,  the  binary  fractional 
number  (0A1)2  could  be  selected  because  it  lies  between  the  endpoints: 

(0a011011)2  <  (0a1)2  <  (0a10000111)2 


The  binary  sequence  to  the  right  of  the  binary  point  is  the  encoded  sequence,  which 
in  this  case  is  the  one-bit  number  1;  four  letters  have  been  encoded  with  a  single  bit. 

As  a  second  example,  consider  encoding  of  the  sequence  “1000”  with  the  same  a 
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priori  probabilities.  The  first  subinterval  is  the  upper  fourth  of  the  unit  interval: 


3 

4 


<  X\  <  1,  \xi 


1 

4 


The  second  subinterval  is  the  lower  3/4  of  xp. 


3 

4 


<^2<- 


3  1 
4 ' 4 


The  third  subinterval  is  the  lower  3/4  of  .r2: 


3 

16 


3  .  3 

4^3<4 


1 

4 


57 

64’ ^ 


The  final  subinterval  is  the  lower  3/4  of  a:4: 


9 

64 


3 

4 


<  x4  < 


219 

256’ 


xA 


The  binary  codes  for  the  endpoints  are: 


9 

256 


1 

2 


1 

4 


(0a11000000)2  <  xa  <  ^ 


1 

4 


0  ,  1  ,  1  ,  1  ,  1  ,  1 

8  +  16  +  32  +  64  +  128  +  256 


(0a11011111)2 


The  code  for  the  subinterval  is  the  lower  limit  (0a11)2,  thus  encoding  the  four-bit 
sequence  with  two  bits. 

Endpoints  for  and  lengths  of  other  possible  sequences  are: 

“0000”  ->•  [0,  (0.75)4)  =  [0,0.31640625)  ->  \x4\  =  (0.75)4 


“1111”  ->•  [1  -  (0.25)4  ,  l)  =  [0.99609375, 1)  ->•  |®4|  =  (0.25)4  =  0.0039  •  •  • 

“1010”  ->•  [0.890625,0.92578125)  ->•  |a;4|  =  0.03515625 

“1011”  ->•  [0.92578125,0.9375)  ->•  |s4|  =  0.01171875 

Notice  that-  strings  of  more  frequent  characters  yield  longer  subintervals,  while  strings 
of  rare  characters  result  in  short  subintervals.  If  the  histogram  of  input  characters 
is  flat,  then  the  intervals  derived  by  the  arithmetic  code  to  representing  a  string  of 
characters  will  be  of  equal  length  and  will  require  equal  numbers  of  bits  to  represent 
them.  If  the  histogram  is  clustered  so  that  some  characters  are  more  likely  to  occur, 
then  strings  of  frequently  occuring  characters  will  be  mapped  to  longer  subintervals 
within  [0,1),  and  may  be  represented  by  short  codes  which  indicate  a  location  in  the 
interval. 

To  decode  the  sequences,  the  a  priori  probabilities  must  be  known.  The  first  t-wo 
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examples  will  be  demonstrated,  where  the  codes  were  binary  strings.  First,  the  binary 
points  are  added  on  the  left  of  the  code  to  yield: 

®  =  (0a1)2  =  \ 

for  the  first  example.  The  decision  point  which  determines  the  first  character  is 
located  at  t\  =  3/4,  so  that-  x  <  t\  and  specifying  that  the  first  character  must  have 
probability  3/4,  i.e.,  it  is  0.  The  decision  point  for  the  second  character  is  the  lower 
3/4  of  the  interval  determined  by  t\  i.e.,  t2  =  9/16.  Again  x  <  t2,  so  the  second 
character  also  must  be  0.  The  decision  point  for  the  third  character  is  located  at: 


3  27  1 

£3  =  -  •  to  =  -  <  X  =  - 

3  4  64  2 


Because  the  coded  point  is  larger  than  £3,  the  third  character  is  a  1  with  probability 
equal  to  0.25.  The  final  decision  point  is  divides  the  upper  quarter  of  the  interval  in 
the  proporation  3:1: 


135 


£4  = - 

256 


>  x 


So  the  fourth  character  also  is  0. 


In  the  second  example,  the  code  is 


®  =  (0a11)2  =  ^ 

The  decision  point  which  determines  the  first  character  is  located  at  t\  =  /,  so 
that  x  =  t.  By  the  convention  specified  for  the  unit  interval,  a  point  at  the  threshold 
is  in  the  upper  subinterval.  This  specifies  that  the  first  character  has  probability  \ 
and  is  1.  The  decision  point  for  the  second  character  divides  the  first  subinterval  in 
proportion  3:1,  so  that: 

_  3  /3  l\  _  15 

h-i  +  \ri)  ~i6 

Because  x  <  t2.  the  second  character  is  0.  The  decision  point  for  the  third  character 
is  located  at: 


Because  x  <  £3.  the  third  character  is  0.  The  fourth  threshold  divides  the  fourth 
interval  in  the  same  ratio,  and  is  located  at: 

_  3  (  /3\  3  l\  _  219 

“  4  +  yVV  4 J  ~  256 

The  coded  point  x  <  £4,  so  the  fourth  character  is  0. 

An  arithmetic  code  may  be  generated  by  redrawing  the  chart  of  the  Huffman 
coding  process  to  make  a  Huffman  decision  tree,  where  each  bit  represents  the  result 
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of  a  binary  decision: 


1  1  =D 
101 1=0 
101 01=E 

1 0 1 00=F 
100  =B 

0  =C 


A  decision  which  selects  a  code  symbol  from  a  set  of  symbols  is  decomposed  into 
a  sequence  of  binary  decisions.  The  codeword  may  be  considered  to  be  the  pointer 
to  the  gray  level  being  coded;  the  binary  codeword  defines  the  steps  taken  to  reach 
the  symbol;  the  more  probable  the  sequence,  the  wider  the  interval  of  the  pointer, 
and  the  shorter  the  decision  sequence  to  obtain  that  codeword.  The  advantage  of 
arithmetic  coding  is  that  it  can  be  adaptive,  i.e.,  the  code  can  adjust  to  changes  in 
the  relative  probabilities  of  symbols.  The  adaptive  arithmetic  coder  assumes  some  a 
priori  probability  distribution  of  characters  and  updates  the  distribution  after  sending 
(or  receiving)  each  new  character. 

This  is  the  basis  for  the  Q-  coder,  which  was  developed  by  IBM  to  encode  binary 
sequences  using  a  12-bit  register.  The  coder  derives  a  robust  estimate  of  the  prob¬ 
ability  distribution  as  it  reads  the  source  symbols.  Because  of  its  adaptability,  the 
Q-coder  is  effective  for  nonstationary  sources. 

20.4.7  Dictionary-Based  Compression 

Huffman  and  arithmetic  coding  use  the  statistics  of  the  message/image  (or  a  model 
of  the  statistics)  to  construct  shorter  codes  for  frequently  occurring  characters/gray 
levels.  In  other  words,  single  gray  levels  (or  blocks  of  gray  levels)  are  represented 
by  strings  of  varying  length.  A  different  kind  of  compression  encodes  variable-length 
strings  as  single  characters,  or  tokens.  In  a  sense,  this  is  the  type  of  coding  used 
for  postal  ZIP  codes  -  5-digit  numbers  are  used  to  represent  individual  post  offices 
whose  names  and  addresses  require  strings  of  different  lengths.  Another  example 
is  the  common  cryptograph  where  a  word  may  be  coded  by  its  coordinates  in  the 
dictionary  (page  number  and  position  on  the  page). 

Dictionary-based  compression  replaced  frequently  occuring  characters/strings/gray 
levels  with  symbols,  often  called  tokens.  This  may  be  done  statically,  where  the  en¬ 
tire  message  is  scanned  for  recurring  strings  to  construct  the  dictionary,  which  will 
be  optimum  for  a  specific  message  and  which  must  be  communicated  to  the  decoder. 
Adaptive  codes  may  be  constructed  to  build  up  the  dictionary  as  the  message  is 
scanned,  and  the  dictionary  can  adapt  to  varying  statistics  of  the  message  characters. 
In  this  case,  the  size  of  the  dictionary  must  be  enlarged  to  allow  new  symbols  to  be 
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entered  as  the  statistics  change.  A  common  application  is  the  compression  of  data 
before  sending  it  to  magnetic  tape/disk.  One  nice  side  effect  is  an  increase  in  the 
effective  transfer  rate  of  the  tape.  The  QIC  (Quarter-Inch  Cartridge)  tape  standard, 
a  common  backup  medium  for  computer  disks,  uses  a  dictionary-based  compression 
scheme  which  intermixes  plain  text  and  dictionary  symbols  by  adding  a  one-bit  sym¬ 
bol  to  distinguish  the  two. 

Though  these  dictionary-based  compression  schemes  were  introduced  for  storing 
text  files,  they  may  be  used  for  storing  image  or  graphics  files  as  well;  they  just 
assume  that  the  binary  files  are,  in  fact,  ASCII  data.  The  GIF  image  compression 
scheme  developed  by  CompuServe  is  a  dictionary-based  algorithm  that  was  designed 
for  graphics  files  and  images. 

Lempel-Ziv- Welch  (LZW)  Coding 

During  the  discussion  of  Huffman  coding,  it  became  obvious  that  it  is  desirable  to 
encode  an  image  while  it  is  being  read,  without  knowledge  of  either  the  histogram  of 
the  final  image  or  the  corrlations  among  neighboring  pixels.  A  simple  method  for  this 
type  of  real-time  encoding  was  developed  by  Lempel  and  Ziv  ( A  universal  algorithm 
for  sequential  data  compression ,  IEEE  Trans.  Info.  Thry.  23,  337-343,  1977  and 
Compression  of  individual  sequences  via  variable-rate  coding ,  IEEE  Trans.  Info. 
Thry.  24,  530-536,  1978),  with  a  later  extension  by  Welch  ( A  Technique  for  high- 
performance  data  compression ,  IEEE  Computer  17,  8,  1984).  Compression  based 
these  works  are  referred  to  as  LZ77,  LZ78,  and  LZW,  respectively.  LZ77  allowed  a 
4  KByte  dictionary  of  symbols  for  the  codebook;  matches  within  the  text  with  data 
strings  already  seen  are  encoded  as  fixed-length  pointers.  LZ77  uses  a  large  sliding 
text-  window  (several  thousand  characters)  that  scans  over  the  text,  viewing  a  large 
block  of  recently  coded  text  and  a  small  block  of  text  to  be  coded.  Text  in  the  look¬ 
ahead  buffer  is  compared  to  the  dictionary  to  find  matches.  This  may  take  much  time 
during  the  compression  step,  but  decompression  is  not  so  constrained.  The  length  of 
the  longest  possible  match  is  limited  by  the  size  of  the  look-ahead  buffer. 

LZ78  abandoned  the  concept  of  the  text  buffer  and  built  up  the  dictionary  of 
character  strings  in  increments  of  one  character.  The  encoded  strings  may  be  very 
long,  thus  allowing  significant  compression  if  strings  repeat  themselves  frequently. 
The  LZW  process  is  used  in  several  of  the  common  PC  shareware  utilities  for  com¬ 
pressing  data  files  ( e.g .  PKARC,  PKZIP,  PAK).  The  ARC  scheme  was  introduced  in 
1985  and  dominated  compression  of  MS-DOS  files  for  several  years,  in  no  small  part- 
due  the  fact  that-  it  was  available  as  shareware.  LZW  generates  a  fairly  efficient  code 
as  pixel  gray  levels  are  read,  without  prior  knowledge  of  the  image  statistics.  For  an 
m-bit  image  (2,n  gray  levels),  the  LZW  implementation  is: 

1.  Select  the  number  of  bits  k  in  the  codeword,  for  a  total  of  2k  codewords.  The 
number  A;  must  be  greater  than  m  (preferably,  k  »  m ). 

2.  Initialize  the  codes  by  setting  the  first-  2m  codes  t-o  the  available  gray  levels  - 
this  ensures  that-  legitimate  codes  exist  for  all  pixels,  even  after  the  code  table 
is  filled. 
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3.  Read  the  gray  level  of  the  first  pixel;  it  is  the  first  element  in  the  string  “S” 
and  is  a  member  of  the  code  table  (from  step  2). 

4.  If  all  pixels  have  been  read,  then  output  the  code  for  US”  (k  bits)  and  quit. 

5.  If  pixels  are  left,  then  read  the  gray  level  P  of  the  next  pixel  and  append  to 
“S'”  to  create  string  “S'P”. 

6.  If  “S'P”  is  in  the  table  of  codewords,  then  set  “S'”  =  “S'P”  and  go  to  Step  4, 
otherwise  continue 

7.  Append  the  codeword  for  “S'”  (k  bits)  to  the  coded  image. 

8.  If  there  are  unused  codewords  left,  then  add  “S'P”  to  the  table. 

9.  Reset  the  string  “S”=  “P”  and  go  to  step  5. 

Consider  this  simple  example  of  a  36-pixel  2-bit  image  (m  =  2,4  levels  a,  /?,  7, 5), 
which  requires  72  bits  to  transmit  as  is: 

/  [n]  =  aaa/3aa'yaaSaaa/3aaa'jaa/3aa/3aaaaaaaaa/3aa'j 

Assume  that  k  =  4,  so  that  the  codebook  contains  24  =  16  symbols.  From  step  2,  the 
first  four  codes  are  A  =  a,  P  =  /?,  C  =  7,  and  D  =  S.  The  coding  process  proceeds 
in  this  manner: 

1.  Si  =  “a”,  in  codebook  as  “A” 

Pi  =  “a”  — >  S'iPi  =  “aa”,  not  in  codebook, 

51  Pi  =  aa  is  assigned  to  the  first  available  symbol  “E”  in  codebook, 
First  character  in  output  is  code  for  a  =“A” 

52  -  Pi  =  “a” 

2.  P2  =  third  character  =  “a” 

S2P2  =  “aa” ,  already  exists  in  codebook  as  symbol  “E” 

53  — >  S2P2  =  “aa” 

3.  P3  =  /?,  S3P3  =  “aa/?” ,  not  in  codebook 
S3P3  =  “aa/?”  assigned  to  “F”  in  codebook 
Second  character  in  coded  image  =  “P”  =  “aa” 

S*4  -►  S3P3  =  “/?” 

4.  P4  =  “a” ,  S>4P4  =  “/?a” ,  not  in  codebook 
S4P4  =  “/?a”  assigned  to  “G”  in  codebook 
Third  character  in  coded  image  =  “P”  =  “/?” 

S5^p4= 

5.  P5  =  “a” ,  S5P5  =  “aa”  exists  in  codebook  as  “E” 

56  ->  “aa” 

6.  P6  =  “Y’jS'ePe  =  “aay”,  not  in  codebook 
SeP6  =  “aay”  assigned  to  “H”  in  codebook 

Fourth  character  in  coded  image  =  ”E”,  code  =  “AEBE-  •  •  ” 

57  =  7 
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7.-  ■  • 

After  all  pixels  have  been  int-erogated  (and  if  I  made  no  mistakes),  the  codebook 
is: 


Symbol 

String 

A 

a 

B 

(3 

C 

7 

D 

<5 

E 

aa 

F 

aa/3 

G 

/3a 

H 

aery 

I 

ya 

J 

aa5 

K 

Sa 

L 

aaa 

M 

a/3 

N 

f3aa 

0 

aaya 

P 

aa/3a 

The  entire  coded  message  is  “AEBECEDEAGHFPLEFH” .  Note  that  new  elements 
are  added  to  the  codebook  quite  rapidly,  and  the  later  elements  typically  represent 
longer  and  longer  sequences.  When  the  codebook  is  full,  the  last  element  (or  the 
least  used)  can  be  deleted,  and  the  process  can  proceed  with  the  available  elements. 
The  coded  image  requires  17  characters  at  4  bits  per  character,  for  a  total  of  68  bits, 
which  is  not  much  of  a  reduction  when  compared  to  the  original  quantity  of  72  bits. 

If  a  3-bit  code  had  been  used,  the  last  character  in  the  codebook  would  be  H, 
and  the  coded  message  would  be  AEBECEDEAGECFEEEFH,  for  a  total  of  18  3-bit 
characters,  or  54  bits.  Note  that  the  total  number  of  bits  for  the  3-bit  code  is  less 
than  for  the  4-bit  code  (which  is  not  the  usual  case).  This  illustrates  the  sensitivity 
of  the  process  to  the  local  image  statistics. 

As  mentioned  above,  the  LZW  algorithm  was  used  in  most  PC  file  compressors 
(archiving  programs  such  as  “PKARC”  and  “PKZIP”)  back  in  the  days  of  small  disk 
drives.  In  those  applications,  the  files  consisted  of  1-bit  ASCII  characters.  LZW  with 
a  13-bit  codeword  was  known  as  “squashing”,  while  12-bit  LZW  was  “crunching” . 
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20.5  Transform  Coding 

The  various  lossless  compression  schemes  (Huffman,  arithmetic,  and  LZW  coding)  are 
useful  for  compressing  images  with  clustered  histograms  (scalar  or  vector).  However, 
the  compression  ratios  are  fixed  by  the  statistics  of  the  images.  In  many  cases, 
“lossy”  coding  is  useful  where  nonredundant  information  is  discarded.  If  the  loss 
of  this  information  is  not  objectionable  to  the  viewer,  then  the  reduction  in  storage 
requirements  may  well  be  beneficial.  Probably  the  most  familar  example  of  transform 
compression  is  JPEG  encoding. 

One  convenient  method  for  implementing  lossy  coding  is  to  first  construct  a  new 
representation  of  the  original  image  via  an  invertible  image  transformation.  Such 
a  transformation  may  be  a  gray-scale  remapping  (lookup  tables)  for  monochrome 
images,  a  color-space  transformation  for  color  images,  or  a  shift-invariant  or  shift- 
variant-  spatial  transformation.  These  transformations  “reorganize”  the  gray  values 
of  the  image  and  thus  change  the  correlation  properties  of  the  image.  It  is  possible 
to  compress  the  reorganized  gray  values  using  one  (or  more)  of  the  schemes  already 
considered. 

In  the  discussion  of  image  processing  operations,  we  have  seen  that  images  may 
be  recast  into  a  different  (and  possibly  equivalent)  form  by  an  image  transformation. 
The  general  form  for  the  transformation  of  a  1-D  vector  is: 

F  M  f  N  m  [4  nl 

n 

where  m[£,n]  is  the  2-D  “reference”  or  “mask”  function  of  the  transform.  A  famil¬ 
iar  such  function  is  the  Fourier  transform,  where  m[£,n]  =  exp  \-2nijf-].  If  the 
transform  is  invertible,  then  there  exists  a  mask  function  M[n,£]  such  that: 

i 

The  transformation  is  a  space-invariant-  convolution  if  the  mask  function  m  [£,  n] 
has  the  form  of  a  convolution  kernel:  m  [£  —  n]  =  h\i  —  n] .  In  the  study  of  linear  sys¬ 
tems,  it-  is  demonstrated  that-  convolution  with  an  impulse  response  h[n]  is  invertible 
if  the  transfer  function  H  [k]  is  nonzero  everywhere.  A  space-variant-  transformation 
is  invertible  if  the  set-  of  mask  functions  m[£,n]  is  complete.  In  either  case,  the  gray 
level  F  at  pixel  f  j  of  the  transformed  image  is  a  measure  of  the  similarity  of  the  input 
image  f[n]  and  t-he  specific  mask  m[£\,n\.  As  such,  the  transformed  image  pixels 
F[£ i]  are  measures  of  the  correlation  of  f[n]  and  m  [£\ ,  n];  the  greater  the  similarity 
between  image  and  mask,  the  larger  the  amplitude  of  F  at  £\. 

The  goal  of  transform  coding  is  t-o  generate  an  image  via  an  invertible  transform 
whose  histogram  is  less  flat-  (more  clustered)  than  that-  of  the  original  image,  thus 
having  less  entropy  and  requiring  fewer  bits  for  st-orage/t-ransmission.  Many  authors 
describe  the  operation  as  compacting  the  image  information  into  a  small  number  of 
coefficients  (corresponding  to  pixels  of  the  transformed  image),  though  I  prefer  the 
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picture  of  the  transform  as  generating  an  image  with  a  more  clustered  histogram. 
Though  transform  coding  often  clusters  the  histogram,  the  lunch  is  not  free;  we  pay 
by  increasing  the  number  of  possible  gray  levels  to  be  encoded  and  thus  the  number 
of  bits  per  recorded  pixel  if  the  image  is  to  be  recovered  without  loss.  In  the  example 
of  encoding  the  image  derivative  considered  below,  the  transformed  image  F[£]  may 
occupy  up  to  twice  as  many  levels  as  the  input.  If  the  transform  F[£]  is  requantized 
to  the  same  number  of  levels  before  storage/transmission,  some  information  is  lost 
and  f[n]  cannot  be  recovered  perfectly.  This  is  one  example  of  lossy  coding  where 
the  compression  efficiency  can  be  quite  high.  If  the  statistics  of  the  source  can  be 
quantified  (e.g.,  discrete  memoryless  source,  Markov  source,  etc.),  it  is  possible  to 
quantify  the  effect  of  reducing  the  number  of  encoded  levels  on  the  fidelity  of  the  final 
image.  These  studies  form  a  branch  of  information  theory  which  is  known  as  rate 
distortion  theory. 


Block  diagram  of  transform  compression.  The  original  image  f  [n,  m\  is  converted  to 
a  different  coordinate  system  via  the  invertible  transformation 
T  {/  [n,  m]}  =  F  [k,£],  which  is  then  quantized  and  Huffman  coded.  The 
quantization  step  ensures  that  the  recovered  image  is  generally  an  estimate  f  [n,  m\ 

To  introduce  the  concept  of  compression  via  image  transformations,  consider  a 
64-pixel,  6-bit,  1-D  image  f[n]  of  a  linear  ramp  in  the  interval  —32  <  n  <  31,  where 
n  is  the  pixel  address: 
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This  histogram  of  this  image  is  flat  with  a  population  of  one  pixel  per  bin.  and 
therefore  the  information  content  of  the  image  is  6  bits  per  pixel. 


1-D  6-bit  “ramp”  image  f  [n]  =  n  +  32  for  —32  <  n  <  31.  The  histogram  is  “flat” 
with  one  count  per  gray  level.  The  information  content  is  6  bits  per  pixel. 


Because  the  slope  of  f[n\  is  constant,  the  image  of  its  derivative  is  constant  also. 
Recall  that  discrete  differentiation  can  be  implemented  as  a  convolution: 

dn 

h  [n]  = 


f  [n]  *  h  [n] 


+1 


-1 


0 


For  this  derivative  operation,  the  discrete  transfer  function  (discrete  Fourier  trans¬ 
fer  of  the  impulse  response  h[n\)  is: 


H  [4]  =  |4| 

and  is  nonzero  at  all  samples  k  except  at  the  origin  (i.e.,  at  zero  frequency  -  the 
constant  part  or  average  value).  The  derivative  of  a  1-D  image  is  invertible  if  the 
average  (or  initial)  value  is  known  as  a  boundary  condition.  In  this  example,  the 
histogram  of  the  derivative  has  only  two  occupied  gray  levels,  and  the  entropy  of 
the  derivative  image  is  0.116  bits/pixel.  This  transformed  image  can  be  encoded  by 
one-bit  characters  with  a  compression  efficiency  of  rj  =  =  52,  which  means  that 

this  coding  scheme  produced  a  bit  rate  that  is  very  much  smaller  than  the  Shannon 
limit.  How  is  this  possible?  Because  we  encoded  a  quality  of  groups  of  characters 
rather  than  the  individuals. 
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n  Range 

Derivative  of  the  ramp  image:  f  [n]  =  /  [n]  —  f  [n  —  1] ;  the  “gray  value  ”  of  the  first 
pixel  is  “ 0 ”  and  all  of  the  rest  are  “1  ” .  The  histogram  has  63  pixels  at  1  and  1  pixel 
at  0,  for  an  information  content  of  0.116  bits  per  pixel. 


Obviously,  derivative  images  can  have  negative  gray  values  (though  it  did  not  in 
this  case).  In  fact,  if  the  dynamic  range  of  / [n]  is  64  levels  in  the  interval  [0,63]  (6 
bits  per  pixel),  the  theoretical  dynamic  range  of  the  derivative  image  is  127  levels 
in  the  interval  [-63,63]  (not  quite  7  bits  per  pixel).  So  although  the  histogram  of 
the  derivative  image  may  have  less  entropy  if  the  pixels  of  the  original  image  are 
well  correlated,  an  additional  bit  per  pixel  must  be  stored  if  the  image  is  to  be 
recovered  without  loss.  If  some  error  in  the  uncompressed  image  is  tolerable,  sparsely 
populated  levels  in  the  transform  may  be  substituted  with  a  similar  gray  level,  or 
levels  with  small  amplitudes  may  be  quantized  to  0.  These  can  significantly  reduce 
the  information  content.  The  latter  process  is  called  coring  by  some  in  the  image 
compression  community. 

The  process  of  encoding  the  derivative  image  rather  than  the  original  is  the  basis 
for  both  run-length  encoding  and  differential  pulse  code  modulation  (DP CM).  These 
are  examples  of  predictive  coding,  where  a  reduction  in  entropy  is  obtained  by  trans¬ 
mitting  the  difference  between  the  actual  gray  value  at  a  pixel  and  a  prediction 
obtained  by  some  rule.  In  run-length  encoding  of  images  with  large  uniform  regions 
(e.g.,  binary  text  images  for  FAX  machines),  the  transmitted  data  are  the  number 
of  consecutive  pixels  with  the  same  gray  level  before  the  next  switch.  If  the  strings 
of  Os  and  Is  are  long,  run-length  encoding  can  reduce  the  data  stream  significantly. 
In  DPCM,  the  gray  value  at  a  pixel  is  predicted  from  some  linear  combination  of 
previous  gray  levels;  the  error  e  between  the  actual  gray  level  and  the  prediction  is 
quantized  and  encoded.  The  predictors  of  the  pixel  gray  value  f[x,y\  may  include 
one  or  several  previous  pixels,  including  f[x  —  1  ,y\,f[x  —  1  ,y  —  1  ],f[x,y  —  1],  and 
f[x  +  1,  y - 1],  as  shown  below: 


f[x-l,y-l\ 

f  [x,  y  -  l] 

f[x  +  l,y-l] 

f[x-  1,2/] 

/  [x,  y] 
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The  predictor  may  be  expressed  as  a  linear  combination  of  these  levels: 

/  [T  v\=Y1  ao/  [x-i,y-  j] 

where  the  aVJ  are  the  weights  of  the  linear  combination.  Among  the  predictors 
commonly  used  are: 

(1)  1-D  first-order  (based  on  one  pixel):  f[x0,y0\  =  f[x o  —  l,yo] 

(2)  2-D  second-order  (based  on  two  pixels):  f[x o,  yo]  =  \  (/  [^o  —  1,  Vo\  +  /  [^o,  Vo  — 

(3)  2-D  third-order  (based  on  three  pixels): 


f[xo,yo]  =  i  (3  f[xo  -  1, yo]  -  2  /[a;0  -  l,2/o  -  1]  +  3  f[x0,y0  -  1]) 


In  the  first  case,  the  difference  between  the  actual  and  predicted  gray  values  is 
just  the  discrete  first  derivative: 


e  [a;0,  yo]  =  f  fco,  yo]  -  f  [®o,  yo]  =  f  [®o,  yo]  -  f[x o  -  l,  yo] 


Of 


dx 


X=XQ 


The  2-D  predictor  usually  improves  compression  efficiency  significantly  over  1-D 
predictors  for  real  images. 

In  adaptive  DPCM.  the  mathematical  form  of  the  predictor  may  vary  based  on 
the  image  structure.  The  compression  in  DPCM  results  because  the  prediction  error 
is  quantized  to  fewer  levels  than  the  original  image  data.  Note  that  the  final  image 
may  exhibit  grainy  or  contouring  errors  if  the  minimum  quantization  level  is  coarse 
and  the  gray  levels  vary  slowly  across  the  image. 


20.5.1  Color-Space  Transformations 


0.299  0.587  0.114 

R 

Y 

0.596  -0.274  -0.322 

G 

= 

I 

0.211  -0.523  0.312 

B 

_Q_ 

The  “luminance”  Y  is  a  weighted  sum  of  the  color- value  triplet  [R,  G,  B],  Note  that 
the  weights  sum  to  one,  which  means  that  a  gray  pixel  with  the  values  R  =  G  =  B 
will  have  the  same  luminance  value. 


Y  =  0.299 R  +  0.587 G  +  0.1145 

/  and  Q  are  the  “chrominance”  values  and  are  weighted  differences  of  the  color- 
value  triplet,  where  the  weights  sum  to  zero.  /  is  a  weighted  sum  of  green  and  blue 
subtracted  from  a  weighted  red;  in  other  words,  it  may  be  thought  of  as  Red  -  Cyan. 
The  Q  channel  is  weighted  green  subtracted  from  a  weighted  combination  of  red  and 
blue,  or  Magenta  -  Green. 
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20.5.2  Space- Variant  Transformations 

Most  authors  consider  only  space-variant  operations  as  transforms  for  image  coding; 
for  1-D  images,  the  form  of  the  space-variant  operation  is: 

F  w  =  f  m p  [n; 

n 

while  for  2-D  images,  the  form  of  the  space-variant,  operation  is: 

F\kJ}  =  E  /  [n,  to]  p  [n,  m;  k ,  i] 

n,m 

In  words,  the  2-D  transform  is  the  product  of  a  4-D  matrix  and  the  2-D  input.  The 
most  common  such  transform  in  imaging  is  the  discrete  Fourier  transform  (DFT);  the 
mask  m[n,k]  for  the  1-D  DFT  is  the  set  of  1-D  complex-valued  sinusoids  with  spatial 
frequency  k/N: 


p  [n;  A;]  =  exp 


=  cos 


.  2irnk 
1  N 
f 2nnk\ 

— r 


sin 


2nnk'\ 


The  mask  p[n,  m ;  k.  A]  for  2-D  images  is  the  set  of  2-D  complex-valued  sinusoids;  they 
vary  in  a  sinusoidal  fashion  along  one  direction  and  are  constant  in  the  orthogonal 
direction.  For  an  N  x  N  input  image,  the  mathematical  expression  for  the  mask 
function  is: 


p  [n,  to;  k,  A]  =  exp 


2m  (■ nk  +  mi) 

N 


(  mi\ 

(  mi\ 

cos 

2w  ^„k  +  —  J 

—  i  sin 

2ir  +  _  j 

The  spatial  frequencies  of  the  sinusoidal  mask  indexed  by  k  and  i  are  respectively 
£  =  and  rj  =  jj.  The  gray  level  of  each  pixel  in  the  transformed  image  F\k,i] 
describes  the  degree  of  similarity  between  f[n,m]  and  that  specific  2-D  sinusoid. 
Recall  that  the  amplitudes  of  all  pixels  in  a  1-D  sinusoid  can  be  completely  specified 
by  three  numbers:  the  magnitude,  spatial  frequency,  and  phase.  In  other  words,  the 
gray  value  of  a  particular  sample  of  a  sinusoid  is  determined  completely  by  any  other 
pixel  if  the  parameters  of  the  sinusoid  are  known;  a  perfect  interpixel  correlation 
exists  among  the  amplitudes.  The  DFT  operation  compresses  the  image  information 
into  the  number  of  bits  required  to  represent  those  three  parameters.  Therefore, 
an  image  composed  of  a  small  number  of  sinusoidal  components  can  be  compressed 
to  a  very  significant  degree  by  converting  to  its  Fourier  representation.  We  begin 
by  considering  a  simple  1-D  case;  the  image  to  be  compressed  has  64  pixels  and  is 
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a  1-D  sinusoid  of  period  32  pixels,  as  shown  below.  The  image  has  been  sampled 
but  not  quantized,  i.e.,  all  real  numbers  between  0  and  31  are  allowed  gray  levels. 
Because  the  Shannon  entropy  (information  content)  is  defined  as  a  sum  over  discrete 
probabilities  (gray  levels),  it  strictly  can  not  apply  to  images  with  real- valued  gray 
levels;  the  image  must  be  quantized  to  calculate  the  entropy.  The  histogram  of  the 
sinusoid  after  quantizing  to  64  bins  is  shown;  note  that  it  is  approximately  flat: 


/  in 


16  +  15  cos 


,  0  <  /  [n]  <  31 


One  definition  of  the  1-D  DFT  is: 


F  B]  =  F  [k  ■  A£]  =s-  F[k] 


£  /w 

_ 1Y 


exp 


27T  ink 

iv 


Note  that  the  transform  often  includes  a  scale  factor  of  iV-1  that  is  not  included 
here.  In  words,  the  transform  is  the  sum  of  gray  values  of  the  product  of  the  input 
image  and  the  mask  function,  which  is  real- valued  in  the  interval  [—1,  +1].  In  general, 
the  transform  F  [k]  generates  noninteger  values  that  typically  lie  outside  the  dynamic 
range  of  f[n\.  The  amplitude  of  the  transform  at  k  =  0  is  the  sum  of  the  gray  values 
of  the  image.  In  the  example  under  consideration  which  has  a  mean  gray  value  of  16, 
the  DC  value  of  the  DFT  is: 

F  [k  =  0]  =  64  pixels  •  16  (mean  gray)  =  4096 

The  entire  discrete  spectrum  F  [k]  has  61  samples  with  value  zero,  two  with  value 
480  (located  at  k  =  ±2  so  that  £  =  ±|)  =  ±A)  and  one  with  value  4096  at  k  =  0. 
The  histogram  of  F  [k]  generally  needs  an  infinite  number  of  bins  to  account  for  the 
continuously  valued  range  of  F  [, k ],  but  is  certainly  much  less  flat  than  the  histogram 
of  f[n];  thus  there  is  less  entropy  in  F[£], 

The  sinusoid  after  quantization  to  32  gray  levels  (5  bits  per  pixel)  and  the  resulting 
histogram  are  shown  below.  The  original  image  entropy  is  3.890  Since  the 

image  is  quantized,  it  no  longer  is  a  sampled  pure  sinusoid  and  the  Fourier  transform 
includes  extra  artifacts.  The  histogram  of  the  quantized  transform  indicates  that  the 
information  content  of  the  transform  is  only  0.316 

The  1-D  discrete  Fourier  transform  of  an  iV-pixel  real-valued  array  is  an  fV-pixel 
complex-valued  array.  Because  each  complex-valued  pixel  is  represented  as  a  pair 
of  real-valued  numbers,  the  DFT  generates  twice  as  much  data  as  the  input  array. 
However,  the  DFT  of  a  real-valued  array  is  redundant ,  meaning  that  values  in  the 
resulting  DFT  array  are  repeated.  The  real  part  of  the  DFT  of  a  real- valued  function 
is  even  and  the  imaginary  part  is  odd.  Therefore,  half  of  the  data  in  the  DFT  of  a 
real-valued  array  may  be  discarded  without  loss. 
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20.5.3  Block  DFT  Coding  via  DCT 


Because  the  statistical  properties  of  typical  images  vary  across  the  image  (i.e.,  the 
statistics  are  shift  variant),  it  is  common  to  apply  the  selected  transform  to  local 
blocks  of  pixels  (often  of  size  8  x  8  or  16  x  16)  that  are  coded  individually  using 
a  spatial  transform  (to  reduce  gray- level  redundancy),  followed  by  Huffman  coding. 
The  discrete  Fourier  transform  (DFT)  is  a  possible  such  transform,  but  its  advantage 
of  generating  an  equivalent  image  which  exhibits  a  clustered  histogram  often  is  offset 
somewhat  by  its  assumption  that  an  input  array  of  size  NxM  pixels  actually  is  infinite 
array  that  is  periodic  over  NxM  blocks  of  pixels.  In  words,  the  DFT  assumes  that 
the  gray  value  of  a  pixel  off  the  edge  on  one  side  of  the  array  is  identical  to  the  gray 
value  on  the  edge  of  the  other  side  of  the  array.  Of  course,  it  is  common  for  pixels 
at  opposite  edges  of  the  array  to  have  different  gray  levels  (for  sky  at  the  top  and 
for  ground  at  the  bottom,  for  example).  The  gray- level  transitions  at  the  boundaries 
of  the  periods  of  the  array  generate  artifacts  in  the  DFT  known  as  leakage,  or  false 
frequency  components.  Though  these  false  frequency  terms  are  necessary  to  generate 
the  true  sharp  edge  boundaries  of  the  block,  the  additional  samples  of  the  DFT  with 
non-zero  amplitudes  increase  redundancy  (and  thus  the  entropy)  of  the  transformed 
image.  Therefore,  the  potential  efficiency  of  compression  using  the  DFT  often  suffers. 
An  additional  problem  arises  because  the  DFT  of  a  real-valued  discrete  input  image 
is  a  complex- valued  Hermitian  array.  The  symmetry  of  the  transform  array  (even  real 
part  and  odd  imaginary  part)  ensures  that  only  half  of  each  part  need  be  stored,  but 
the  histograms  of  both  parts  must  be  accounted  when  computing  the  entropy  of  the 
transform  coefficients. 


To  reduce  the  impact  of  the  sharp  transitions  at  the  edges  of  the  blocks,  as  well 
as  to  obtain  a  transform  that  is  real- valued  for  a  real- valued  input  image,  the  discrete 
cosine  transform  (DCT)  may  be  used  instead  of  the  DFT.  The  DCT  has  become  very 
important  in  the  image  compression  community,  being  the  basis  transformation  for 
the  JPEG  and  MPEG  compression  standards.  The  DCT  of  an  M  x  M  block  may  be 
viewed  as  the  DFT  of  a  synthetic  2  M  x  2 M  block  that  is  created  by  replicating  the 
original  M  x  M  block  after  folding  about  the  vertical  and  horizontal  edges: 
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The  original  4x4  block  of  image  data  is  replicated  4  times  to  generate  an  8x8 
block  of  data  via  the  DFT  format  and  an  8x8  DCT  block  by  appropriate  reversals. 
The  transitions  at  the  edges  of  the  4x4  DCT  blocks  do  not  exhibit  the  “sharp  ” 

edges  in  the  Ax  A  DFT  blocks. 


The  resulting  2 M  x  2 M  block  exhibits  smaller  discontinuities  at  the  edges.  The 
symmetries  of  the  Fourier  transform  for  a  real-valued  image  ensure  that  the  original 
M  x  M  block  may  be  reconstructed  from  the  DCT  of  the  2 M  x  2  M  block. 

Consider  the  computation  of  the  DCT  for  a  1-D  M- pixel  block  f[n]  (0  <  n  < 
M  —  1).  The  2M-pixel  synthetic  array  g[n]  is  indexed  over  n  (0  <  n  <  2 M  —  1)  and 
has  the  form: 


g[n]  =  f[n]  for  0  <  n  <  M  —  1 


g[n]  =  f  [2 M  —  1  —  n]  for  M  <  n  <  2 M  —  1. 
In  the  case  M  =  8,  the  array  g  [n]  is  defined: 


g[n\ 


f  [n]  for  0  <  n  <  7 
f  [15  —  n]  for  8  <  n  <  15 
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The  values  of  g  [n]  for  8  <  n  <  15  is  a  “reflected  replica”  of  /  [n]  : 

9  [8]  =  /  [7] 

9  [9]  =  /  [6] 

9  [10]  =  /  [5] 

9  [11]  =  /  [4] 

9  [12]  =  /  [3] 

9  [13]  =  /  [2] 

9  [14]  =  /  [1] 

9  [15]  =  /  [0] 

If  the  “new”  array  g  [n]  is  assumed  to  be  periodic  over  2 M  samples,  its  amplitude  is 
defined  for  all  n,  e.g., 

g[n]  =  f  [—1  —  n]  for  —  M  <n  <  —  1  ==>  —16  <  n  <  —  1 


g[n]  =  f[n  +  2 M\  for  —  2 M  <n<  —M  —  1  =>•  —32  <n<  —17 

Note  that  the  16-sample  block  g[n]  is  NOT  symmetric  about  the  origin  of  coordinates 
because  g[— 1]  =  g [0] ;  to  be  symmetric,  g[—£\  would  have  to  equal  g [+£] .  For  example, 
consider  a  1-D  example  where  f[n]  is  an  8-pixel  ramp  as  shown: 

The  2M-point  representation  of  f[n]  is  the  g[n]  just  defined: 


g  [n 


f  [n]  for  0  <  n  <  7 
f  [2  M  —  1  —  n]  for  8  <  n  <  15 


If  this  function  were  symmetric  (even),  then  circular  translation  of  the  16-point  array 
by  8  pixels  to  generate  g[n  —  8  mod  16]  also  be  an  even  function. 

From  the  graph,  it  is  apparent  that  the  translated  array  is  not  symmetric  about 
the  origin;  rather,  it  has  been  translated  by  —  |  pixel  from  symmetry  in  the  2M-pixel 
array.  Thus  define  a  new  1-D  array  c[n ]  that  is  shifted  to  the  left  by  |  pixel: 


c  [n]  =  g 


This  result  may  seem  confusing  at  first;  how  can  a  sampled  array  be  translated  by  | 
pixel?  For  the  answer,  consider  the  continuous  Fourier  transform  of  a  sampled  array 
translated  by  |  unit: 


T1{c[n]}  =  C[i\  =  TAg 


x  — 


=  £[£]■  exp 


-2vr  if 


=  ^  I  g[x\*S 


x  — 


C  [£]  =  G  [£]  •  exp  [-*7r^] 
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Thus  the  effect  of  translation  by  |  pixel  on  the  transform  is  multiplication  by  the 
specific  linear-phase  factor: 

exp  [— in£]  =  cos  [7t£]  —  i  sin  [7t£]  . 

The  2M-point  DFT  of  the  symmetric  discrete  array  (original  array  translated  by  | 
pixel)  has  the  form: 


F‘2m  {c  [n]} 


F 2M 

G  [A:]  •  exp 
G  [k]  ■  f  cos 


1 

U~  2 


—m 
nk 


=  F 2  m  {  g  [n]  *  S 
k 


1 

2 


2  M 


2  M 


C  [k\  =  G  [k]  ■  exp 
nk 


—  i  sm 


2  M 


ink 

2 M 


where  the  continuous  spatial  frequency  £  has  been  replaced  by  the  sampled  frequency 
F-f.  This  function  C  [k]  is  the  DCT  of  f[n\.  Because  the  2M-point  translated  function 
c[n]  is  real  and  even,  so  must  be  the  2M-point  discrete  spectrum  C  [k];  therefore  only 
M  samples  of  the  spectrum  are  independent.  This  array  is  the  DCT  of  f[n]. 


Steps  in  Forward  DCT 

To  summarize,  the  steps  in  the  computation  of  the  1-D  DCT  of  an  M- point  block 
f[n]  are: 

1.  create  a  2M-point  array  g[n]  from  the  M-point  array  f[n]  : 

g[n]  =  f[n]  :  0  <  n  <  M  —  1 

g[n]  =  f[2M  —  1  —  n\  :  M  <  n  <  2 M  —  1 

2.  compute  the  2M-point  DFT  of  g[n\  =  G  [k] 

3.  the  M-point  DCT  C  [k]  =  exp  [— 1||]  •  G  [k]  for  0  <  k  <  M  —  1 

The  entire  process  may  be  cast  into  the  form  of  a  single  equation,  though  the 
algebra  required  to  get  there  is  a  bit  tedious, 

c  [k]  =  2  /  [n]  cos  (  nk  ■  —  J  for  0<k<M  -  1 

n= 0  k  ' 

Steps  in  Inverse  DCT 

The  inverse  DCT  is  generated  by  applying  the  procedures  in  the  opposite  order: 
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1.  create  a  2M-point  array  G  [A;]  from  the  M- point  DCT  C  [A;]: 

7  7T  n 

G  [ k ]  =  exp  H — —  •  C  [A;]  for  0  <  k  <  M  —  1 

2-1  1V1 
7  7T  n 

G[k]  =  —  exp  '  C  [2 M  —  k]  for  M  +  1  <  k  <  2 M  —  1 

2.  compute  the  inverse  2M-point  DFT  of  G  [ k }  —>  g[n] 

3.  f[n]  =  g[n]  for  0  <  n  <  M  —  1 


The  single  expression  for  the  inverse  DCT  is: 

-i  M—l  /  2  I  1  \ 

/  [n]  =  jj  w  M  C  cos  (  nk  '  2A/  )  for  0  <  n  <  M  ~  1 2 3 
where  w  [k]  =  |  for  k  =  0  and  w  [A;]  =  1  for  1  <  k  <  M  —  1. 

Forward  DCT  OF  2-D  Array 

The  corresponding  process  to  compute  the  2-D  DCT  C[k,£]  of  an  M  x  M- pixel  block 
f[n,  m]  is: 

1.  create  the  2 M  x  2M-pixel  g[n,m ]  : 

g  [n,  m]=/  [n,  m }  :  0  <  n,  m  <  M  —  1 

g  [n,  m]  =  f  [2  M  —  1  —  n,  m]  :  M  —  1  <  n  <  2  M  —  1,0  <  m  <  M  —  1 

g  [n,  m]  =  f  [n,  2  M  —  1  —  m\  :  M  —  1  <  n  <  2  M  —  1,0  <  m  <  M  —  1 

g  [ n ,  m]  =  /  [2 M  —  1  —  77.,  2 M  —  1  —  777]  :  M—l  <  n,m  <  2M—1  <  k,  i  <  M—l 

2.  compute  the  2M-point  DFT  of  g  [77.,  777]  — >  G  [k,  £\ 

3.  C  [k,  £]  =  exp  [-^|  •  G  [Ac,  £]  for  0  <  k,£  <  M  -  1 
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Inverse  DCT  OF  2-D  Array 


1.  create  a  2 M  x  2 M  array  G  [k.£]  from  the  DCT  C  [k.£\: 
in  (. k  +  £) 


G[k,£]  =  exp 
G[k,  £]  =  —  exp 
G[k,  £]  =  —  exp 
G[k,£]  =  exp  T 


2  M 
in  (k  + 


2  M 
in  (k  + 


2  M 
in  (. k  +  £) 


C[k,  £]  :  0  <  k,£  <  M  —  1 

■  C[2M  —  n,m]  :  M  —  1  <  n  <  2 M  —  1,0  <  m  < 

■  C[n,  2 M  —  m]  :  M  —  1  <n  <  2 M  —  1,0  <m< 


2  M 


■  C[2M  —  n,  2M  —  m\  :  M  —  1  <  k,£  <  2M  —  1 


2.  compute  the  inverse  2M-point  DFT  of  G[k,£]  — >  g[n,m 


3.  f[n,  to]  =  g[n,  to]  for  0  <  n,  m  <  M  —  1 


The  form  of  the  forward  DCT  may  be  written  more  simply  as: 


n=0  m= 0 


m\  COS 


(2n  +  1)  kn 

(2 n  +  1)  £n 

2N 

LUo 

2N 

V2  lf 


3  =  0 


where: re  \j]  =  < 


1  if  j  =  1,  -  ,N-1 
and  the  corresponding  form  of  the  inverse  2D  DCT  is: 


N- 1 N- 1 

/  [n,  to]  =  EE  iv  [ k }  zv  [£]  F[k,£]  cos 

k=o  e=o 


(2k  +  1)  kn 

~(2£  +  1)  £n~ 

2N 

bUo 

2N 

The  action  of  the  2-D  8x8  block  DCT  will  be  detailed  for  blocks  in  the  64  x  64 
5-bit  image  LIBERTY. 


M-l 
M  -  1 
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8x8  DCT  blocks  of  Liberty 


The  image  and  the  8x8  block  DCT  are  shown.  Note  the  bright  pixel  in  each 
8x8  block;  its  amplitude  is  proportional  to  the  average  gray  value  of  that  block.  The 
numerical  values  of  the  block  and  the  DCT  will  be  shown  for  two  cases.  In  the  first, 
the  block  is  taken  from  the  upper  right  where  all  gray  values  are  white: 
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The  corresponding  8x8  DCT  block  is: 
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The  amplitude  of  the  upper-left  pixel  in  the  DCT  is  the  zero-frequency  (DC)  term; 
its  amplitude  is  eight  times  the  average  gray  value  of  the  block.  The  other  terms  in 
the  DCT  are  proportional  to  the  amplitude  of  oscillating  sinusoidal  components  and 
often  are  called  the  AC  terms;  in  this  constant  block,  the  oscillating  terms  have  null 
amplitude  because  all  gray  values  in  the  block  are  equal. 


The  gray  values  of  a  second  8x8  block  located  near  the  center  of  the  image  are: 
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with  an  average  gray  value  of  17.95. 
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The  amplitudes  of  the  DCT  of  the  block  near  the  center  of  the  image  are  approx¬ 
imately  (rounded  to  one  decimal  place): 


143.6 

14.6 

9.7 

5.7 

-3.3 

-1.1 
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-30.9 

-0.1 
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Again,  the  amplitude  of  the  sample  in  the  upper-left  corner  is  eight  times  the  average 
gray  value  of  17.95.  The  other  63  coefkceints  (the  AC  terms)  are  bipolar.  A  negative 
AC  coefficient  means  that  the  particular  cosine  term  oscillates  out  of  phase  by  n 
radians.  The  rate  of  oscillation  of  an  AC  component  increases  with  distance  from  the 
DC  term,  and  the  direction  of  oscillation  is  determined  by  the  orientation  relative 
to  the  DC  term  at  the  origin;  cosines  that  oscillate  horizontally  are  specified  by  the 
amplitudes  along  the  first  row  and  those  that  oscillate  vertically  by  the  amplitudes 
in  the  first  column.  Note  that  the  amplitudes  of  higher- frequency  AC  components 
(away  from  the  upper-left  corner)  tend  to  be  smaller  than  the  low-frequency  terms 
(towards  the  upper  left).  This  is  the  usual  result  for  realistic  imagery,  and  is  utilized 
to  obtain  additional  compression  in  the  JPEG  standard. 

The  DCT  is  useful  as  a  transform  for  image  compression  because: 


1.  (a)  it  is  real  valued, 

(b)  the  amplitudes  are  proportional  to  the  amplitudes  of  sinusoids  with  differ¬ 
ent  oscillation  rates,  and 

(c)  the  amplitudes  of  higher- frequency  terms  are  smaller  for  realistic  imagery. 


Object  B:  /  [n,m]  =  cos  [27t  (n£'  +  mr)')\ ,  =  \  sin  [f]  =  —  0.1768,  oscillates 

two  times  in  diagonal  direction,  period  X'  =  0.1768'1  =  5.6561,  rf  =  \  cos  [f] 

The  “low-frequency”  content  of  the  signal  concentrates  the  DCT  amplitude  near 
the  origin,  but  the  fact  that  the  array  is  only  pseudoperiodic  over  8  samples  produces 
larger  amplitudes  at  more  pixels  in  the  DCT  array. 


Examples  of  Individual  8x8  blocks  and  their  associated  DCT  arrays 
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Object  A:  f[n,m\  =  cos  [2n  (On  +  ^)] ,  oscillates  “vertically”  two  times  in  8 
samples,  £  =  0,  tj  =  4  average  gray  value  =  0 

Note  that  the  DCT  array  is  zero  except  in  the  first  vertical  column  (horizontal 
frequency  =  0  cycles  per  pixel) 

In  the  next  example,  we  compute  the  DCT  of  the  individual  8x8  “basis  images” . 
Each  of  these  images  produces  an  8  x  8  DCT  that  has  all  pixels  at  zero  except  one. 
The  object  composed  of  these  “blocks”  is  shown  on  the  left  and  the  DCT  on  the  right. 
The  low-frequency  terms  appear  in  the  upper  left  and  the  highest-frequency  terms 
to  the  right  and  bottom.  This  observation  provides  the  basis  for  the  JPEG  encoding 
standard  that  is  considered  next. 


20.6  JPEG  Image  Compression  of  Static  Images 

?? 

A  standard  has  been  developed  by  the  “Joint  Photographic  Experts  Group”  and 
has  become  very  common  since  the  early  1990s.  In  its  original  form,  it  was  based  on 
the  DCT,  to  the  point  where  the  method  is  now  sometimes  called  “JPEG  (DCT)”. 
This  standard  is  based  on  the  properties  of  human  vision  where  the  sensitivity  of  the 
eye  generally  decreases  with  increasing  spatial  frequency. 
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8-bit  grayscale  image  used  to  illustrate  JPEG  image  compression. 


20.6.1  Example  1:  “Smooth”  Region 
JPEG  Encoding 


64  x  64  block  of  pixels  in  illustration 


Magnified  view  of  64  x  64  block  of  pixels 
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“Smooth”  Region  of  image  of  “Parrots” 


Magnified  view  of  8x8  block 


Gray  values  of  8  x  8  block 

110  111  111  110  111  109  112  117 

112  115  111  110  108  111  129  139 

111  112  107  109  120  140  142  142 

108  107  119  126  142  150  145  143 

/  [n,  m]  = 

111  131  137  141  152  148  147  140 

138  138  141  148  144  140  147  148 

141  143  150  144  139  149  151  148 

143  136  137  142  148  148  138  130 


The  midgray  value  of  128  is  subtracted  from  the  values  in  the  8x8  block  to  produce 
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bipolar  data 
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The  DCTs  of  the  8  x  8  blocks  are  evaluated  (data  rounded  to  one  decimal  place): 
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If  the  block  were  pure  white  (gray  value  255),  then  the  DCT  would  produce  a  single 
nonzero  amplitude  of  +1016  at  the  DC  term  and  zero  elsewhere;  if  the  block  were 
black,  the  DC  coefficient  of  the  DCT  would  be  —1024 

The  DCT  of  the  64  8  x  8  blocks  can  be  displayed  in  pictorial  form: 
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parrot.tif 


DCT  values  are  divided  by  the  values  in  a  normalization  matrix  that  accounts  for 
the  contrast  sensitivity  function  of  the  eye.  Smaller  numbers  imply  more  weighting 
applied.  Largest  numbers  apply  to  the  high-frequency  terms  positioned  toward  the 
lower  right  corner.  Note  that  the  DC  component  (less  128)  is  divided  by  a  larger 
number  than  the  neighboring  low-frequency  AC  components. 
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Normalized  8x8  block  displayed  as  an  image. 
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Quantization  rounds  the  values  in  the  block  to  the  nearest  integer;  note  the  large 
number  of  null  coefficients: 

2  -5  0  00000 

-8-21  00000 
-23  1  00000 

f  F[k:£}\  1  1  0  -1  0  0  0  0 

{N[k,£]j  o  0  -1  0  0  0  0  0 

0  0  0  00000 

0  0  0  00000 

0  0  0  00000 

The  2-D  8  x  8  is  scanned  in  a  “zig-zag”  format  to  convert  the  2-D  block  to  a  1-D 
sequence  of  coefficients.  Because  of  the  layout  of  the  scan,  the  largest  coefficients 
should  appear  first  for  most  “real”  images. 


The  zig-zag  path  traveled  during  Huffman  coding  of  the  quantized  DCT  coefficients. 
String  of  Characters: 
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The  DC  coefficients  of  blocks  are  encoded  separately  using  a  differential  coding 
scheme.  The  AC  coefficients  are  encoded  based  the  entropy  (Huffman  code)  with 
three  pieces  of  information: 

1.  the  run  length  of  consecutive  zeros  that  preceded  the  current  element  in  the 
zigzag  sequence. 

2.  the  number  of  bits  to  follow  in  the  amplitude  number 

3.  the  amplitude  of  the  coefficient 


Bit  Count 

Amplitudes 

1 

-1,+1 

2 

-3,-2, +2, +3 

3 

-7  to  -4,  +4  to  +7 

4 

-15  to  -8,  +8  to  +15 

5 

-31  to  -16,  +16  to  +31 

6 

-63  to  -32,  +32  to  +63 

7 

-127  to  -64,  +64  to  +127 

8 

-255  to  -128,  +129  to  +255 

9 

-511  to  -256,  +256  to  +511 

10 

-1023  to  -512,  +512  to  +1023 

2j  (DC  term  coded  separately)  —5  —8  —2  1  3  1  0  0  0  1  1 

0000000000  -1  -1  Ox  39 


This  string  is  encoded  using  a  predetermined  Huffman  code  based  on  the  number 
of  zeros  in  the  string  to  the  next  nonzero  coefficient  and  the  numerical  value  of  the 
coefficient.  Short  strings  of  zeros  are  encoded  with  shorter  sequences 


(0,3)  (0,4)  (0,2)  (0,1)  (0,2)  (0,1)  (2,1)  (0,1)  (10,1)  (0,1)  EOB 

The  Huffman  code  has  the  partial  form: 


100+0010  1011+0+100  01+0+ 
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Recover  from  JPEG  coding: 


First  we  reconstruct  the  approximate  DCT  from  the  Huffman  code  and  multiplication 
by  the  normalization  matrix: 
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0 

0 

0 

0  0 

0  0  0 

0 

0 

0 

0  0 

0  0  0 

0 

0 

0 

0  0 

0  0  0 

IT  produceds  an  8  x  8  block  of  bitonal  “gray  values:” 

-22.6  -16.3  -11.2  -13.0  - 

-18.5 

-19.6 

-13.6  -6.6 

-12.7  -18.4  -24.6  -25.7  - 

-19.5 

-8.7 

1.8 

8.1 

-14.2  -22.1  -27.2  -20.4 

-4.3 

10.0 

15.5 

15.1 

-23.4  -19.1  -8.8  -6.1 

19.3 

23.9 

19.5 

13.5 

-15.1  -2.8  12.9  22.6 

24.1 

21.9 

20.9 

21.4 

8.8  15.4  20.2  17.7 

12.1 

12.9 

22.5 

32.3 

20.1  18.8  16.0  12.7 

11.2 

13.4 

18.5 

22.8 

14.8  11.8  11.4  16.8 

22.8 

21.4 

11.4 

1.6 

These  values  are  rounded  and  the  constant  128  is  added  back  to  obtain  the  recovered 
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block,  which  is  compared  to  the  original  values: 


105 

112 

117 

115 

110 

108 

114 

121 

115 

110 

103 

102 

108 

119 

130 

136 

114 

106 

101 

108 

124 

138 

143 

143 

105 

109 

119 

122 

147 

152 

148 

142 

113 

125 

141 

151 

152 

150 

149 

149 

137 

143 

148 

146 

140 

141 

150 

160 

148 

147 

144 

141 

139 

141 

147 

151 

142 

140 

139 

145 

151 

149 

139 

130 

110 

111 

111 

110 

111 

109 

112 

117 

112 

115 

111 

110 

108 

111 

129 

139 

111 

112 

107 

109 

120 

140 

142 

142 

108 

107 

119 

126 

142 

150 

145 

143 

111 

131 

137 

141 

152 

148 

147 

140 

138 

138 

141 

148 

144 

140 

147 

148 

141 

143 

150 

144 

139 

149 

151 

148 

143 

136 

137 

142 

148 

148 

138 

130 
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The  error  in  the  block  is  obtained  by  subtracting  the  recovered  image  from  the  orig¬ 
inal: 

+5  -1  -6  -5  +1  +1  -2  -4 

-3  +5  +8  +8  0  -8  -1  +3 

-3  +6  +6  +1  -4  +2  -1  -1 

+3  -2  0  +4  -5  -2  -3  +1 

£  [n,  to]  =  /  [n,  to]  —  /  [n,  to]  = 

-2  +6  -4  -10  0  -2  -2  -9 

+1  -5  -7  +2  +4  -1  -3  -12 

-7  -4  +6  +3  0  +8  +4  -3 

+1  -4  -2  -3  -3  -1  -1  0 


20.6.2  Example  2:  “Busy”  Image 

Consider  the  central  8x8  block  of  the  “Liberty”  image,  with  gray  values: 


/  [n,  to 


Subtract  the  constant: 


/  [n,  to]  —  128  = 


160 

184 

96 

160 

192 

136 

168 

152 

176 

152 

160 

144 

160 

88 

80 

128 

56 

64 

120 

168 

120 

72 

64 

200 

112 

56 

40 

56 

136 

120 

48 

192 

168 

128 

40 

56 

136 

120 

48 

192 

176 

40 

112 

24 

88 

120 

64 

160 

152 

48 

168 

48 

16 

120 

8 

8 

48 

176 

160 

128 

16 

32 

96 

104 

32 

56 

-32 

32 

64 

8 

40 

24 

48 

24 

32 

16 

32 

-40 

-48 

0 

-72  - 

-64 

-8 

40 

-8 

-56 

-64 

72 

-16  - 

-72 

-88 

-72 

8 

-8 

-80 

64 

40 

0 

-88 

-72 

8 

-8 

-80 

64 

48  - 

-88 

-16 

-104  - 

-40 

-8 

-64 

32 

24  - 

-80 

40 

-80  - 

112 

-8 

-120 

-120 

-80 

48 

32 

0 

— 

112 

-96 

-32 

-24 

544 


CHAPTER  20  IMAGE  COMPRESSION 


Evaluate  the  8x8  DCT 


-164.0 

40.5 

79.7 

-56.8 

86.0 

-31.2 

77.4 

-32.3 

161.7 

-55.2 

-38.6 

14.4 

90.5 

-28.3 

-83.6 

-3.2 

72.4 

89.6 

-70.4 

-27.5 

-107.1 

27.4 

-71.4 

35.3 

60.4 

8.6 

48.5 

156.9 

8.5 

-35.0 

35.9 

54.3 

34 

-39 

45.1 

-14.1 

0 

-24.3 

-105.3 

-66.5 

-70 

-16.2 

-16.3 

12.4 

-14.8 

68.9 

46.6 

-34.4 

28.5 

-92.1 

0.6 

-50.2 

18.4 

-12.1 

-33.6 

8.7 

13.6 

0.3 

-40.6 

19.2 

0.7 

-43.7 

-7.2 

9.4 

Normalize  by  the  weighting  matrix 


-10.3 

3.7 

8.0 

-3.6 

3.6 

-0.8 

1.5 

-0.5 

13.5 

-4.6 

-2.8 

0.8 

3.5 

-0.5 

-1.4 

-0.1 

5.2 

6.9 

-4.4 

-1.1 

-2.7 

0.5 

-1.0 

0.6 

F[k,e\ 

4.3 

0.5 

2.2 

5.4 

0.2 

-0.4 

0.4 

0.9 

N  [k,  t\  ~ 

1.9 

-1.8 

1.2 

-0.3 

0.0 

-0.2 

-1 

-0.9 

-2.9 

-0.5 

-0.3 

0.2 

-0.2 

0.7 

0.4 

-0.4 

0.6 

-1.4 

0.0 

-0.6 

0.2 

-0.1 

-0.3 

0.1 

0.2 

0.0 

-0.4 

0.2 

0.0 

-0.4 

-0.1 

0.1 

Round  to  nearest  integer 


-10  4  8  -4  4  -1  2  -1 

13  -5  -3  1  3  0-10 

5  7  -4  -1  -3  0  -1  0 

4  1250001 

2  -2  1  0  0  0  -1  -1 

-3  0  0  0  0  1  0  0 

1  -1  0  -1  0  0  0  0 

0  0000000 


integer 


'  F  [k,  i 

N  [k,  7 


20.6  JPEG  IMAGE  COMPRESSION  OF  STATIC  IMAGES  545 

Renormalize  by  multiplying  by  N  [ k ,  £]: 

-160  44  80  -64  96  -40  102  -61 

156  -60  -42  19  78  0  -60  0 

70  91  -64  -24  -120  0  -69  56 

56  17  44  145  0  0  0  62 

36  -44  37  0  0  0  -103  -77 

-72  0  0  0  0  104  0  0 

48  -64  0  -87  0  0  0  0 

00000000 

Calculate  the  inverse  8x8  DCT 

42  45  -14  41  41  -4  59  10 

30  21  19  -6  52  1  -79  8 

-56  -72  21  45  -47  -47  -86  96 

-11  -79  -104  26  -99  -43  -74  58 

f  [n,  m]  —  128  = 

44  -17  -104  -48  -28  8  -81  70 

39  -65  6  -83  -36  -8  -81  32 

31  -70  13  -87  -105  -52  -105  -113 

-85  64  37  -12  -100  -61  -72  -3 

Add  back  the  constant: 


170  173  114  169  169  124  187  138 

158  149  147  122  180  129  49  136 

72  56  149  173  81  81  42  224 

117  49  24  154  29  85  54  186 

/  [n,  m]  = 

172  111  24  80  100  136  47  198 

167  63  134  45  92  120  47  160 

159  58  141  41  23  76  23  15 

43  192  165  116  28  67  56  125 


F  [k,  i\  =  N  [k,  i\  -integer  j  j  = 
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The  error  is  the  difference. 


£  [n,  m]  =  f  [n,  m\  —  /  [n,  m  = 


-10 

11 

-18 

-9 

23 

12 

-19 

14 

18 

3 

13 

22 

-20 

-41 

31 

-8 

-16 

8 

-29 

-5 

39 

-9 

22 

-24 

-5 

7 

16 

-98 

107 

35 

-6 

6 

-4 

17 

16 

-24 

36 

-16 

1 

-6 

9 

-23 

-22 

-21 

-4 

0 

17 

0 

-7 

-10 

27 

7 

-7 

44 

-15 

-7 

5  -16  -5  12  -12  -35  40  -21 


Note  that  the  error  is  much  larger  in  several  of  the  locations,  because  the  high- 
frequency  coefficients  that  are  necessary  to  produce  the  “sharp  edges”  have  been 
quantized  to  zero. 


